Package org.apache.nutch.indexer.feed
Class FeedIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.feed.FeedIndexingFilter
-
- All Implemented Interfaces:
Configurable
,IndexingFilter
,Pluggable
public class FeedIndexingFilter extends Object implements IndexingFilter
- Since:
- NUTCH-444
An
IndexingFilter
implementation to pull out the relevant extractedMetadata
fields from the RSS feeds and into the index. - Author:
- dogacan, mattmann
-
-
Field Summary
Fields Modifier and Type Field Description static String
dateFormatStr
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description FeedIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocument
filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to theIndexer
for indexing within the Nutch index.Configuration
getConf()
void
setConf(Configuration conf)
Sets theConfiguration
object used to configure thisIndexingFilter
.
-
-
-
Field Detail
-
dateFormatStr
public static final String dateFormatStr
- See Also:
- Constant Field Values
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Extracts out the relevant fields:- FEED_AUTHOR
- FEED_TAGS
- FEED_PUBLISHED
- FEED_UPDATED
- FEED
Indexer
for indexing within the Nutch index.- Specified by:
filter
in interfaceIndexingFilter
- Parameters:
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the page (fetch datum from segment containing fetch status and fetch time)inlinks
- page inlinks- Returns:
- modified (or a new) document instance, or null (meaning the document should be discarded)
- Throws:
IndexingException
- if an error occurs during during filtering
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
- Returns:
- the
Configuration
object used to configure thisIndexingFilter
.
-
setConf
public void setConf(Configuration conf)
Sets theConfiguration
object used to configure thisIndexingFilter
.- Specified by:
setConf
in interfaceConfigurable
- Parameters:
conf
- TheConfiguration
object used to configure thisIndexingFilter
.
-
-