Package org.apache.nutch.indexer.basic
Class BasicIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.basic.BasicIndexingFilter
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class BasicIndexingFilter extends Object implements IndexingFilter
Adds basic searchable fields to a document. The fields added are : domain, host, url, content, title, cache, tstamp domain is included depending onindexer.add.domain
in nutch-default.xml. title is truncated as perindexer.max.title.length
in nutch-default.xml. (As per NUTCH-1004, a zero-length title is not added) content is truncated as perindexer.max.content.length
in nutch-default.xml.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description BasicIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheBasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields.Configuration
getConf()
Get theConfiguration
objectvoid
setConf(Configuration conf)
Set theConfiguration
object
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheBasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields. Seeindexer.add.domain
,indexer.max.title.length
,indexer.max.content.length
in nutch-default.xml.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- TheNutchDocument
objectparse
- The relevantParse
object passing through the filterurl
- URL to be filtered for anchor textdatum
- TheCrawlDatum
entryinlinks
- TheInlinks
containing anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException
- if an error occurs during during filtering
-
setConf
public void setConf(Configuration conf)
Set theConfiguration
object- Specified by:
setConf
in interfaceConfigurable
-
getConf
public Configuration getConf()
Get theConfiguration
object- Specified by:
getConf
in interfaceConfigurable
-
-