Skip navigation links

Package com.google.enterprise.connector.util.filter

Document filters act to transform their source Document's Properties.

See: Description

Package com.google.enterprise.connector.util.filter Description

Document filters act to transform their source Document's Properties. Document filters can add, remove, or modify a document's properties, including the document content. Properties in which the filter has no interest are passed through unmodified. A document filter might even throw a SkippedDocumentException to prevent a document from being fed to the Google Search Appliance.

Multiple document filters may be chained together, forming a transformational document processing pipeline. Similar to a Unix command pipeline, the filters are linked together, each using the previous one as its source Document.

Pipeline diagram

Documents are extracted from the Repository, and then supplied to the filter chain. FilterA gets first crack at the document, then FilterB, then finally FilterC, before the Document is added to the Feed and sent to the GSA.

Filter instances are manufactured anew for each Document by a DocumentFilterFactory, which wraps a new Filter around the supplied source Document.

Several filters are included in this package, providing the ability to modify property values, add, copy, rename, or remove properties.

You can also implement custom document filters. By extending AbstractDocumentFilter, you need only override one or two methods to implement a new filter.

Document filters are configured in the Connector Manager's documentFilters.xml file, located in the web application's WEB-INF directory. Document filters defined here will be applied to all documents across all connector instances hosted by the Connector Manager.

Document filters may also be configured for individual connector instances in a connector's connectorInstance.xml (Advanced Configuration) or connectorDefaults.xml file. Connector-specific document filters will be applied before the Connector Manager's global document filters.

For example, a filter chain might be configured as follows:


   <bean id="DocumentFilters"
         class="com.google.enterprise.connector.util.filter.DocumentFilterChain">
     <constructor-arg>
       <list>
         <!-- Don't reveal the secret recipe! -->
         <bean id="FilterA"
               class="com.google.enterprise.connector.util.filter.DeletePropertyFilter">
           <property name="propertyName" value="SecretRecipe"/>
         </bean>
         <!-- Make news articles appear in title and author searches. -->
         <bean id="FilterB"
               class="com.google.enterprise.connector.util.filter.CopyPropertyFilter">
           <property name="propertyNameMap">
             <map>
               <entry key="HeadLine" value="Title"/>
               <entry key="ByLine" value="Author"/>
             </map>
           </property>
         </bean>
         <!-- Reveal authors behind noms de plume. -->
         <bean id="FilterC"
               class="com.google.enterprise.connector.util.filter.ModifyPropertyFilter">
           <property name="propertyName" value="Author"/>
           <property name="pattern" value="Mark Twain"/>
           <property name="replacement" value="Samuel Clemens"/>
           <property name="overwrite" value="false"/>
         </bean>
       </list>
     </constructor-arg>
   </bean>
   
Since:
2.8
Skip navigation links