public class SkipDocumentFilter extends Object implements MetadataTransform
propertyName
(the name comes from Connectors v3, where Documents had
Properties
). If the config has pattern
that is a regular
expression, then whether any value of the propertyName
property
matches the regular expression is determined. If pattern
is not set,
then whether any key named propertyName
is present determines the
match. The key skipOnMatch
determines whether to skip the matching
documents (if that key is set to true
) or to skip all but the
matching documents (if that key is set to false
). By default, both
Document Metadata
and params
are searched for the matching
propertyName
; the config key corpora
may be set to
metadata
or to params
to restrict the search to only
Metadata
or params
, respectively. Most keys/values of
interest will normally be specified in the document's Metadata
, but
some key/values of interest (e.g. ContentType, DocId) exist in the document's
params
.
Example: skip documents that have a NoIndex
metadata key or params
key, regardless of value:
metadata.transform.pipeline=skipDocumentFilter
metadata.transform.pipeline.skipDocumentFilter.factoryMethod=com.google.enterprise.adaptor.prebuilt.SkipDocumentFilter.create
metadata.transform.pipeline.skipDocumentFilter.propertyName=NoIndex
Example 2: skips documents whose Metadata Classification
property
is neither PUBLIC
nor DECLASSIFIED
:
metadata.transform.pipeline=skipDocumentFilter
metadata.transform.pipeline.skipDocumentFilter.factoryMethod=com.google.enterprise.adaptor.prebuilt.SkipDocumentFilter.create
metadata.transform.pipeline.skipDocumentFilter.propertyName=Classification
metadata.transform.pipeline.skipDocumentFilter.pattern=(PUBLIC)|(DECLASSIFIED)
metadata.transform.pipeline.skipDocumentFilter.skipOnMatch=false
metadata.transform.pipeline.skipDocumentFilter.corpora=metadata
MetadataTransform.HistoricalWrapper, MetadataTransform.TransmissionDecision
KEY_CONTENT_TYPE, KEY_CRAWL_ONCE, KEY_DISPLAY_URL, KEY_DOC_ID, KEY_LAST_MODIFIED_MILLIS_UTC, KEY_LOCK, KEY_TRANSMISSION_DECISION
Modifier and Type | Method and Description |
---|---|
static SkipDocumentFilter |
create(Map<String,String> cfg) |
String |
toString() |
void |
transform(Metadata metadata,
Map<String,String> params)
Adds a single
Map.Entry to the params Map : key
Transmission-Decision , value TransmissionDecision.AS_IS to
indicate that the document is to be retained, or value
TransmissionDecision.DO_NOT_INDEX to indicate that the document is
to be skipped. |
public void transform(Metadata metadata, Map<String,String> params)
Map.Entry
to the params Map
: key
Transmission-Decision
, value TransmissionDecision.AS_IS
to
indicate that the document is to be retained, or value
TransmissionDecision.DO_NOT_INDEX
to indicate that the document is
to be skipped. The decision is based on settings of the
propertyName
, pattern
, skipOnMatch
, and
corpora
configuration variables (as discussed above).transform
in interface MetadataTransform
metadata
- of documentparams
- are extra contextual informationpublic static SkipDocumentFilter create(Map<String,String> cfg)