Google Search Appliance software version 6.2
Connector manager version 2.4.0
Posted December 2009
This section provides an overview of the connector manager Service Provider Interface (SPI). The SPI consists of interfaces, classes, and methods that the connector manager calls in a connector to implement a connector instance, traverse documents, and to authenticate and authorize users. Javadoc for the SPI is provided on the connector manager open source site. For connector terminology definitions, see the Google Enterprise Glossary.
Chapters: Home,
About This Guide,
Introduction,
Getting Started,
SPI Overview,
Traversing Documents,
Authentication,
Authorization,
Configuration,
Appendix A: Building a Debug Connector Manager
The connector manager SPI consists of interfaces and methods that specify authentication, authorization, traversal of content management system documents, and enable you to supply forms that appear in the Admin Console so that administrators can add or change connector parameters. The connector manager communicates with a connector using the SPI methods. The connector does not communicate directly with the search appliance.
The connector manager SPI provides the following interfaces:
com.google.enterprise.connector.spi package defines the connector
framework. You need to provide classes that implement these interfaces. SimpleXXX functions provide simple implementations of the
SPI interfaces. In accordance with the Java programming convention, use of these objects
is optional. You can implement the required interfaces by using the simple classes
directly, by subclassing them, or you may write your own implementations.The Connector interface provides connector-specific functionality
that supports a content management system.
Connector sessions run in a stateless condition. Ensure that the connection with a content management system is available each time the connector starts a task. The connector manager keeps calling the connector to get new documents.
The return value of the DocumentList object enables a connector
to specify how the connector manager handles traversal. If after reading documents,
no more updated documents remain to be traversed, the connector manager waits 5 minutes
before resuming a traversal. Alternatively, connectors can specify zero documents in
a list to inform the connector manager to set a checkpoint and then resume traversing.
Otherwise, the connector manager traverses and processes documents continuously.
Only if the servlet container restarts or an unrecoverable error occurs does the
connector manager start a new session. This means that the login method
is called once and not again unless the connector manager terminates unexpectedly,
thus the connector implementation must maintain its own access with the content
management system.
The Connector interface instantiation sequence is as follows:

When the connector manager discovers a connector, the connector manager calls
the login method of the connector to create a new instance of a
connector by instantiating the Connector interface. The login
method starts access to managers for authentication, authorization, and traversal. The
base use of the login method returns a Session object that
passes data and objects between the connector manager and a connector.
The login method gets a session with sufficient privileges to perform
all actions related to the SPI. A connector supplies credentials for the content
management system in the TraversalManager interface.
The Session interface provides the following methods:
The Session.getAuthenticationManager method
returns the AuthenticationManager class object.
The AuthenticationManager interface functions
are shown in the following illustration:

The AuthenticationManager interface provides
the authenticate method that returns the
AuthenticationResponse class object and which
provides an isValid method to determine whether
a username and password is valid for controlled-access documents.
The AuthenticationManager also provides the
AuthenticationIdentity interface, which has methods
for getDomain, getPassword, and
getUsername. The AuthenticationIdentity
provides the parameter to the authenticate method.
The AuthenticationManager interface routes
authentication requests from the search appliance to your connector.
If your connector supplies documents by URL or your connector is
only intended for search appliances that are configured with SSO,
then you need not implement this interface.
The Session.getAuthorizationManager method returns
the AuthorizationManager class object.
The AuthorizationManager interface functions are
shown in the following illustration:

The AuthorizationManager interface provides the
authorizeDocids method to authorize multiple documents
from a content management system. The authorizeDocids
method returns a collection of AuthorizationResponse
class objects that indicate which documents a user is authorized
to view. The methods of this interface route authorization requests
from the search appliance to your connector. If your connector
only supplies public (world-readable) documents, then you need
not implement this interface.
The Session.getTraversalManager method returns the
TraversalManager class object. The traversal manager
acquires documents from the content management system. (Traversing
means to acquire documents from the content management system.)
The TraversalManager interface functions are shown
in the following illustration:

The TraversalManager interface provides the
startTraversal and resumeTraversal methods
for acquiring documents from a content management system, and the
setBatchHint method for specifying the size of the
DocumentList class object. The traversal manager issues
queries to retrieve documents for a connector that provides a content
feed, or to retrieve a URL for a connector that provides a metadata-and-URL feed.
The connector manager can use the traversal manager to start a traversal
from the beginning of a content management system or resume a previous traversal.
A traversal resolves into a document list of traversed documents or URLs and metadata.
The connector manager traversal sequence is as follows:
The connector manager calls the setBatchHint method to
inform the connector that the number of documents to traverse need not
be higher than the number specified in the setBatchHint method.
By checking the setBatchHint method first, the connector
knows how many documents to return when the startTraversal
or resumeTraversal methods complete.
The connector manager calls the startTraversal method to
start acquiring documents from the content management system. The connector
manager receives a list of documents from the startTraversal method.
DocumentList.
The connector manager calls the nextDocument method
to get another document.. Each Document returned is fed to
the search appliance. The connector sets the return value of the
DocumentList object to instruct the connector manager what to do
next. A null return value indicates that traversal is done.
An empty return value indicates to set the checkpoint and keep traversing,
and documents in the DocumentList indicates to send the
documents to the search appliance.
The connector sets the checkpoint after every batch of documents. The checkpoint helps the connector remember where within the content management system the traversal was paused, so that the connector may resume at that point in the future.
The connector manager calls the setBatchHint method to
inform the connector that the number of documents to traverse need not
be higher than the number specified in the setBatchHint method.
The connector manager calls the resumeTraversal method,
supplying it the previously remembered checkpoint to indicate where to
resume acquiring documents from the content management system. The
resumeTraversal method returns a DocumentList
identifying documents to traverse in this batch. The connector manager moves
to Step 3 to process the DocumentList and repeats Steps 3 to 6 indefinitely.
After traversing the most recent documents in a content management
system, the startTraversal and resumeTraversal
methods return the documents in a DocumentList from which
the connector manager can extract documents, properties, and property
values. The SimpleDocumentList class is a basic implementation
of the DocumentList interface.
The DocumentList interface instantiation sequence is as follows:
The DocumentList interface provides the checkpoint
method that indicates the current position within the document list, that is
where to start a resumeTraversal method. The nextDocument
method gets the next document from the document list that the connector acquires
from the content management system.
For more information, see Iterating Over a Document List.
The Document interface gets the properties from a document. This
interface enables the connector manager to extract property values and names from
a document. Properties provide information about the document, such a document ID,
MIME type, and last-modified date that the connector manager uses to process documents.
The Document interface instantiation sequence is as follows:
The Document interface gets the properties from the
document list. For more information, see Metadata Properties. The
findProperty method returns Property class objects. The
findProperty method returns the property that it is asked to get from
the connector. The getPropertyNames method returns a list of all properties
that the connector makes available.
The findProperty method may be called in situations where the property
would not be returned from the getPropertyNames method. The connector manager
may ask a connector for a property that the connector does not supply. When this occurs,
the connector must not throw an exception. The connector can return null for the
findProperty method.
The SimpleDocument class is a basic implementation of the
Document interface.
A property provides information about a document, such as its document ID, MIME type, and last-modified date. For more information, see Metadata Properties.
The Property interface instantiation sequence is as follows:

The Property interface gets each property value.
The SimpleProperty class is a basic implementation of the Property interface.
The Value objects are used to provide property values from the
connector to the connector manager as returned from calls to the
Property.nextValue method.
The Value class provides factory methods for constructing
Value objects for various data types.
A connector cannot implement its own Value class or even
subclass the one provided (unlike most of the other interfaces, for which
a connector must provide an implementation).
The methods in the Value class are as follows:

The Value class provides conversion methods for
numeric and string values. For more information on the Calendar object formats, see
RFC 822 and
Date and Time Formats, which describes
ISO 8601.
The ConnectorType interface provides methods that configure a connector instance. The SimpleConnectorType class is a basic implementation of the ConnectorType interface.
The ConnectorType interface instantiation sequence is as follows:
ConnectorType object
using the parameters in the connectorType.xml file. getConfigForm method
to provide an XHTML configuration form in which an administrator can
specify connector parameter values in the Admin Console. validateConfig method to verify the form information.
The validateConfig method can also call the
ConnectorFactory class to instantiate a connector and
verify that the connector instance can communicate with the content
management system. If validation fails, the validateConfig
method adds a message to the configuration form for the administrator
to correct the form information. The getConfigData,
getFormSnippet, and getMessage methods
only work with validateConfig as indicated by the red
line in the illustration.getPopulatedConfigForm
method to retrieve stored information and create an XHTML form with the
information. ConfigureResponse object that can
contain a configuration form and an optional message to display on the
Admin Console if corrections are needed.The ConnectorType interface provides methods that
the connector manager calls to supply rows in an XHTML table as a
form to the Admin Console so that an administrator can specify or
change parameter values for a connector. For more information and an
example of configuration form XHTML table rows, see
Creating a Configuration Form.
For information on how a search appliance, connector manager, and
a connector handle the tasks associated with each ConnectorType method,
see Following Connector Type and Implementation Processes.
The ConnectorFactory interface enables the
ConnectorType.validateConfig method to instantiate
a temporary connector instance to ensure that the content management
system responds to the host and port provided in the configuration
form. If this test fails, the validateConfig method can
send the configuration form back to the Admin Console to get the
correct access information for the content management system.
The ConnectorFactory interface also gives a connector
access to the values in the connectorInstance.xml after
the connector is instantiated. Access to this XML file enables a
connector to view the properties in the file and potentially modify
information to pass in the ConfigureResponse object that
the validateConfig method returns.
The ConnectorFactory interface instantiation sequence is as follows:

The makeConnector method returns a Connector object, which exists only while the ConnectorType.validateConfig method runs.
Previous Chapter: Getting Started
Next Chapter: Traversing Documents