Interface IndexingFilter

    • Field Detail

      • X_POINT_ID

        static final String X_POINT_ID
        The name of the extension point.
    • Method Detail

      • filter

        NutchDocument filter​(NutchDocument doc,
                             Parse parse,
                             Text url,
                             CrawlDatum datum,
                             Inlinks inlinks)
                      throws IndexingException
        Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.
        Parameters:
        doc - document instance for collecting fields
        parse - parse data instance
        url - page url
        datum - crawl datum for the page (fetch datum from segment containing fetch status and fetch time)
        inlinks - page inlinks
        Returns:
        modified (or a new) document instance, or null (meaning the document should be discarded)
        Throws:
        IndexingException - if an error occurs during during filtering