Package org.apache.nutch.parse
Class HTMLMetaTags
- java.lang.Object
-
- org.apache.nutch.parse.HTMLMetaTags
-
public class HTMLMetaTags extends Object
This class holds the information about HTML "meta" tags extracted from a page. Some special tags have convenience methods for easy checking.
-
-
Constructor Summary
Constructors Constructor Description HTMLMetaTags()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description URL
getBaseHref()
Metadata
getGeneralTags()
Properties
getHttpEquivTags()
boolean
getNoCache()
Get the current value ofnoCache
.boolean
getNoFollow()
Get the current value ofnoFollow
.boolean
getNoIndex()
Get the current value ofnoIndex
.boolean
getRefresh()
Get the current value ofrefresh
.URL
getRefreshHref()
int
getRefreshTime()
void
reset()
Sets all boolean values tofalse
.void
setBaseHref(URL baseHref)
Sets thebaseHref
.void
setCache()
SetsnoCache
tofalse
.void
setFollow()
SetsnoFollow
tofalse
.void
setIndex()
SetsnoIndex
tofalse
.void
setNoCache()
SetsnoCache
totrue
.void
setNoFollow()
SetsnoFollow
totrue
.void
setNoIndex()
SetsnoIndex
totrue
.void
setRefresh(boolean refresh)
Setsrefresh
to the supplied value.void
setRefreshHref(URL refreshHref)
Sets therefreshHref
.void
setRefreshTime(int refreshTime)
Sets therefreshTime
.String
toString()
-
-
-
Method Detail
-
reset
public void reset()
Sets all boolean values tofalse
. Clears all other tags.
-
setNoFollow
public void setNoFollow()
SetsnoFollow
totrue
.
-
setFollow
public void setFollow()
SetsnoFollow
tofalse
.
-
setNoIndex
public void setNoIndex()
SetsnoIndex
totrue
.
-
setIndex
public void setIndex()
SetsnoIndex
tofalse
.
-
setNoCache
public void setNoCache()
SetsnoCache
totrue
.
-
setCache
public void setCache()
SetsnoCache
tofalse
.
-
setRefresh
public void setRefresh(boolean refresh)
Setsrefresh
to the supplied value.- Parameters:
refresh
- value to set
-
setBaseHref
public void setBaseHref(URL baseHref)
Sets thebaseHref
.- Parameters:
baseHref
- value to set
-
setRefreshHref
public void setRefreshHref(URL refreshHref)
Sets therefreshHref
.- Parameters:
refreshHref
- value to set
-
setRefreshTime
public void setRefreshTime(int refreshTime)
Sets therefreshTime
.- Parameters:
refreshTime
- value to set
-
getNoIndex
public boolean getNoIndex()
Get the current value ofnoIndex
.- Returns:
- true if no index is desired, false otherwise
-
getNoFollow
public boolean getNoFollow()
Get the current value ofnoFollow
.- Returns:
- true if no follow is desired, false otherwise
-
getNoCache
public boolean getNoCache()
Get the current value ofnoCache
.- Returns:
- true if no cache is desired, false otherwise
-
getRefresh
public boolean getRefresh()
Get the current value ofrefresh
.- Returns:
- true if refresh is desired, false otherwise
-
getBaseHref
public URL getBaseHref()
- Returns:
- the
baseHref
, if set, ornull
otherwise.
-
getRefreshHref
public URL getRefreshHref()
- Returns:
- the
refreshHref
, if set, ornull
otherwise. The value may be invalid ifgetRefresh()
returnsfalse
.
-
getRefreshTime
public int getRefreshTime()
- Returns:
- the current value of
refreshTime
. The value may be invalid ifgetRefresh()
returnsfalse
.
-
getGeneralTags
public Metadata getGeneralTags()
- Returns:
- all collected values of the general meta tags. Property names are tag names, property values are "content" values.
-
getHttpEquivTags
public Properties getHttpEquivTags()
- Returns:
- all collected values of the "http-equiv" meta tags. Property names are tag names, property values are "content" values.
-
-