Class ArbitraryIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.arbitrary.ArbitraryIndexingFilter
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class ArbitraryIndexingFilter extends Object implements IndexingFilter
Adds arbitrary searchable fields to a document from the class and method the user identifies in the config. The user supplies the name of the field to add with the class and method names that supply the value. Example:
<property>
<name>index.arbitrary.function.count</name>
<value>1</value>
</property>
<property>
<name>index.arbitrary.fieldName.0</name>
<value>advisors</value>
</property>
<property>
<name>index.arbitrary.className.0</name>
<value>com.example.arbitrary.AdvisorCalculator</value>
</property>
<property>
<name>index.arbitrary.constructorArgs.0</name>
<value>Kirk</value>
</property>
<property>
<name>index.arbitrary.methodName.0</name>
<value>countAdvisors</value>
</property>
<property>
<name>index.arbitrary.methodArgs.0</name>
<value>Spock,McCoy</value>
</property>
To set more than one arbitrary field value, incrementindex.arbitrary.function.count
and repeat the rest of these blocks with successive int values appended to the property names, e.g. fieldName.1, methodName.1, etc.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description ArbitraryIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
TheArbitraryIndexingFilter
filter object uses reflection to instantiate the configured class and invoke the configured method.Configuration
getConf()
Get theConfiguration
objectvoid
setConf(Configuration conf)
Set theConfiguration
objectvoid
setIndexedConf(Configuration conf, int ndx)
Set theConfiguration
object for a specific set of values in the config
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheArbitraryIndexingFilter
filter object uses reflection to instantiate the configured class and invoke the configured method. It requires a few configuration settings for adding arbitrary fields and values to the NutchDocument as searchable fields. Seeindex.arbitrary.function.count
, and (possibly multiple instances whenindex.arbitrary.function.count
> 1) of the followingindex.arbitrary.fieldName
.index,index.arbitrary.className
.index,index.arbitrary.constructorArgs
.index,index.arbitrary.methodName
.index, andindex.arbitrary.methodArgs
.index in nutch-default.xml or nutch-site.xml where index ranges from 0 toindex.arbitrary.function.count
- 1.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- TheNutchDocument
objectparse
- The relevantParse
object passing through the filterurl
- URL to be filtered by the user-specified classdatum
- TheCrawlDatum
entryinlinks
- TheInlinks
containing anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException
- if an error occurs during during filtering
-
setConf
public void setConf(Configuration conf)
Set theConfiguration
object- Specified by:
setConf
in interfaceConfigurable
-
setIndexedConf
public void setIndexedConf(Configuration conf, int ndx)
Set theConfiguration
object for a specific set of values in the config- Parameters:
conf
- The Configuration object holding values for the current arbitrary field.ndx
- The ordinal counter value for the current arbitrary field appended to the base property names in the xml configuration file.
-
getConf
public Configuration getConf()
Get theConfiguration
object- Specified by:
getConf
in interfaceConfigurable
-
-