Class MimeTypeIndexingFilter

    • Constructor Detail

      • MimeTypeIndexingFilter

        public MimeTypeIndexingFilter()
    • Method Detail

      • filter

        public NutchDocument filter​(NutchDocument doc,
                                    Parse parse,
                                    Text url,
                                    CrawlDatum datum,
                                    Inlinks inlinks)
                             throws IndexingException
        Description copied from interface: IndexingFilter
        Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.
        Specified by:
        filter in interface IndexingFilter
        Parameters:
        doc - document instance for collecting fields
        parse - parse data instance
        url - page url
        datum - crawl datum for the page (fetch datum from segment containing fetch status and fetch time)
        inlinks - page inlinks
        Returns:
        modified (or a new) document instance, or null (meaning the document should be discarded)
        Throws:
        IndexingException - if an error occurs during during filtering
      • main

        public static void main​(String[] args)
                         throws IOException,
                                IndexingException
        Main method for invoking this tool
        Parameters:
        args - run with no arguments to print help
        Throws:
        IOException - if there is a fatal I/O error processing the input args
        IndexingException - if there is a fatal error whils indexing