Class SimilarityScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- All Implemented Interfaces:
Configurable
,Pluggable
,ScoringFilter
public class SimilarityScoringFilter extends AbstractScoringFilter
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description SimilarityScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CrawlDatum
distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)
Distribute score value from the current page to all its outlinked pages.Configuration
getConf()
void
passScoreAfterParsing(Text url, Content content, Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process.void
setConf(Configuration conf)
-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
generatorSortValue, indexerScore, initialScore, injectedScore, passScoreBeforeParsing, updateDbScore
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.nutch.scoring.ScoringFilter
orphanedScore
-
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
- Overrides:
getConf
in classAbstractScoringFilter
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
- Overrides:
setConf
in classAbstractScoringFilter
-
passScoreAfterParsing
public void passScoreAfterParsing(Text url, Content content, Parse parse) throws ScoringFilterException
Description copied from interface:ScoringFilter
Currently a part of score distribution is performed using only data coming from the parsing process. We need this method in order to ensure the presence of score data in these steps.- Specified by:
passScoreAfterParsing
in interfaceScoringFilter
- Overrides:
passScoreAfterParsing
in classAbstractScoringFilter
- Parameters:
url
- page urlcontent
- original content. NOTE: modifications to this value are not persisted.parse
- target instance to copy the score information to. Implementations may modify this in-place, primarily by setting some metadata properties.- Throws:
ScoringFilterException
- if there is a fatal error processing score data in subsequent steps after parsing
-
distributeScoreToOutlinks
public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException
Description copied from interface:ScoringFilter
Distribute score value from the current page to all its outlinked pages.- Specified by:
distributeScoreToOutlinks
in interfaceScoringFilter
- Overrides:
distributeScoreToOutlinks
in classAbstractScoringFilter
- Parameters:
fromUrl
- url of the source pageparseData
- ParseData instance, which stores relevant score value(s) in its metadata. NOTE: filters may modify this in-place, all changes will be persisted.targets
- <url, CrawlDatum> pairs. NOTE: filters can modify this in-place, all changes will be persisted.adjust
- a CrawlDatum instance, initially null, which implementations may use to pass adjustment values to the original CrawlDatum. When creating this instance, set its status toCrawlDatum.STATUS_LINKED
.allCount
- number of all collected outlinks from the source page- Returns:
- if needed, implementations may return an instance of CrawlDatum,
with status
CrawlDatum.STATUS_LINKED
, which contains adjustments to be applied to the original CrawlDatum score(s) and metadata. This can be null if not needed. - Throws:
ScoringFilterException
- there is a fatal error distributing score data from the current page to all of its outlinks
-
-