Package org.apache.nutch.net
Interface URLExemptionFilter
-
- All Superinterfaces:
Configurable
,Pluggable
- All Known Implementing Classes:
ExemptionUrlFilter
public interface URLExemptionFilter extends Pluggable, Configurable
Interface used to allow exemptions to external domain resources by overridingdb.ignore.external.links
. This is useful when the crawl is focused to a domain but resources like images are hosted on CDN.
-
-
Field Summary
Fields Modifier and Type Field Description static String
X_POINT_ID
The name of the extension point.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description boolean
filter(String fromUrl, String toUrl)
Checks if toUrl is exempted when the ignore external is enabled-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
X_POINT_ID
static final String X_POINT_ID
The name of the extension point.
-
-
Method Detail
-
filter
boolean filter(String fromUrl, String toUrl)
Checks if toUrl is exempted when the ignore external is enabled- Parameters:
fromUrl
- : the source url which generated the outlinktoUrl
- : the destination url which needs to be checked for exemption- Returns:
- true when toUrl is exempted from dbIgnore
-
-