Document
filters act to transform their source Document's
Properties
.See: Description
Interface | Description |
---|---|
DocumentFilterFactory |
An interface for factories that create
Document filters. |
Class | Description |
---|---|
AbstractDocumentFilter |
A base
Document filter implementation that does nothing. |
AclPropertyFilter |
A
Document filter that forces the SpiConstants.CaseSensitivityType field
for all ACL Principals
supplied by the connector to be set to a specified value. |
AddPropertyFilter | |
CopyPropertyFilter |
A
Document filter that copies a Property's
values to another property. |
DeletePropertyFilter |
A
Document filter that removes the specified
Properties from the document. |
DocumentFilterChain |
DocumentFilterChain constructs a chain of Document
filters. |
ModifyPropertyFilter |
A
Document filter that alters the values of the specified
Properties . |
MovePropertyFilter | |
SkipDocumentFilter |
Document
filters act to transform their source Document's
Properties
.
Document filters can add,
remove, or modify a document's properties, including the document
content. Properties in which the filter has no interest are
passed through unmodified. A document filter might even throw a
SkippedDocumentException
to prevent a document from being fed to the Google Search Appliance.
Multiple document filters may be chained together, forming a transformational document processing pipeline. Similar to a Unix command pipeline, the filters are linked together, each using the previous one as its source Document.
Documents are extracted from the Repository,
and then supplied to the filter chain. FilterA
gets first
crack at the document, then FilterB
, then finally FilterC
,
before the Document is added to the Feed and sent to the GSA.
Filter instances are manufactured anew for each Document by a
DocumentFilterFactory
, which wraps a new Filter around the supplied
source Document.
Several filters are included in this package, providing the ability to modify property values, add, copy, rename, or remove properties.
You can also implement custom document filters. By extending
AbstractDocumentFilter
, you need only override one or
two methods to implement a new filter.
Document filters are configured in the Connector Manager's
documentFilters.xml
file, located in the web application's
WEB-INF
directory. Document filters defined here will be applied
to all documents across all connector instances hosted by the Connector
Manager.
Document filters may also be configured for individual connector instances
in a connector's connectorInstance.xml
(Advanced Configuration)
or connectorDefaults.xml
file. Connector-specific document filters
will be applied before the Connector Manager's global document filters.
For example, a filter chain might be configured as follows:
<bean id="DocumentFilters"
class="com.google.enterprise.connector.util.filter.DocumentFilterChain">
<constructor-arg>
<list>
<!-- Don't reveal the secret recipe! -->
<bean id="FilterA"
class="com.google.enterprise.connector.util.filter.DeletePropertyFilter">
<property name="propertyName" value="SecretRecipe"/>
</bean>
<!-- Make news articles appear in title and author searches. -->
<bean id="FilterB"
class="com.google.enterprise.connector.util.filter.CopyPropertyFilter">
<property name="propertyNameMap">
<map>
<entry key="HeadLine" value="Title"/>
<entry key="ByLine" value="Author"/>
</map>
</property>
</bean>
<!-- Reveal authors behind noms de plume. -->
<bean id="FilterC"
class="com.google.enterprise.connector.util.filter.ModifyPropertyFilter">
<property name="propertyName" value="Author"/>
<property name="pattern" value="Mark Twain"/>
<property name="replacement" value="Samuel Clemens"/>
<property name="overwrite" value="false"/>
</bean>
</list>
</constructor-arg>
</bean>