Google Search Appliance software version 6.2
Connector manager version 2.4.0
Posted December 2009
This section provides a tutorial that enables you to create a simple Java connector to connect to a simulated content management system, and to see how connector development works. This connector uses Java and Eclipse freeware and enables you to follow along step-by-step. Eclipse is a commonly used tool for Java programming. If you have another integrated development environment (IDE), we recommend that you use Eclipse for this tutorial so that you can follow each step. The use of an IDE in this tutorial simplifies the development procedures and enables you to develop a connector in a production programming environment.
For connector terminology definitions, see the Google Enterprise Glossary.
Chapters: About This Guide,
Introduction, Getting Started,
SPI Overview, Traversing Documents,
Authentication, Authorization,
Configuration,
Appendix A: Building a Debug Connector Manager
This section shows you how to write a simple "Hello World" connector. To simplify the code, the connector does not use a real content management system. Instead, the connector repeatedly provides the same document to the search appliance and uses a simulated URL for the document. The contents of the document is “Hello World."
To complete this tutorial, you need:
This workstation needs network access to the search appliance. The workstation can run any operating system that supports a Java Development Kit (JDK)--this tutorial describes use with Windows.
The sample files that you use in this tutorial are available in the current documentation folder:
In this tutorial, you will complete the following activities:
In this section, you set up the connector development environment by performing the following activities:
In this section, you install the Java Development Kit (JDK), Eclipse, and Apache Tomcat (the servlet container).
To set up the connector development environment:
The JDK is available from http://java.sun.com. The connector framework supports any version of the JDK.
Eclipse is available from http://www.eclipse.org. You can develop connectors using any integrated development environment (IDE) such as Eclipse. For this tutorial, you need Eclipse to complete the tasks. (The procedures use Eclipse version 3.3.2.)
CATALINA_HOME
for the Variable name and the enter the path to the Tomcat download folder, for example, c:\apache-tomcat-6.0.18\
as the Variable value.
echo $CATALINA_HOME
env CATALINA_HOME=/usr/local/apps/apache-tomcat-6.0.18
~/.bashrc
file: export CATALINA_HOME=/usr/local/apps/apache-tomcat-6.0.18
bin
and webapps
folders to be used in
Deploying the Connector Manager on Apache Tomcat.
%CATALINA_HOME%\bin
\startup
$CATALINA_HOME/bin
/startup.sh
In this section, you download the open source connector manager software:
In the Featured Downloads section, choose which compressed file to download:
Platform | Binary File |
---|---|
Linux or Macintosh | connector-manager-2.4.0.tar.gz |
Windows | connector-manager-2.4.0.zip |
In this section, after opening the connector manager compressed file from the previous section, you need to extract the connector manager .war
file, copy it, and extract the .jar
file for the SPI. The service provider interface .jar
file provides classes for each interface that is coded in the example Hello World connector file that you can create in this tutorial.
The instructions that follow are for a Windows system, if you are using a Linux or Macintosh, refer to your operating system documentation for how to open a .tar
file, and how to extract, copy, and rename files.
To extract the connector-spi.jar
file:
connector-manager-2.4.0
. connector-manager.war
file and store it in a folder on your computer. connector-manager.war
file so that you have two copies of the same file. connector-manager.war
file to be connector-manager.zip
. connector-manager.zip
file, open the WEB-INF
folder, and open the lib
folder. connector-spi.jar
file to the current folder on your computer.
You will need to use the connector-spi.jar
file next the later section, Adding the Connector Manager JAR file to the Eclipse Project.
In this section, create a new project in Eclipse for use in this tutorial.
To create a Java project in Eclipse:
Locate the folder where you installed the Eclipse software:
eclipse.exe
executable.eclipse
command at a Terminal. If this does not work, you can locate Eclipse using the whereis eclipse
command, and then enter the full directory path. Click File > New > Project > Java Project and click Next. The New Java Project window appears.
If you have Java perspective set, you can create the project with File > New > Java Project.
Enter the HelloWorldConnector project name. You can leave the default settings as they appear.
The new project is created and saved in the default workspace.
Click Window > Show View > Package Explorer.
lib
folder.
HelloWorldConnector
project name in Package Explorer and
click New > Folder.lib
and click Finish.In this section, add the connector manager's SPI .jar
file to Eclipse's working folders.
To add the .jar
file to the lib
folder that you created in Step 6 of
Creating a Java Project in Eclipse:
connector-spi.jar
file.
Return to the folder where the connector-spi.jar
file resides (as described in Step 6 of
Extracting the Connector SPI JAR File) and copy the file to the clipboard.
lib
folder.connector-spi.jar
file.
In Eclipse, right click the lib
folder and click Paste to put the connector-spi.jar
file in the lib
folder.
To register the connector-spi.jar
file with Eclipse:
Right click the lib
folder and click Refresh from the menu. The connector-spi.jar
entry appears.
(A build path indicates to Eclipse how to find the code for compiling.) Right click the connector-spi.jar
file and click Build Path > Add to Build Path.
Note: The connector-spi.jar
file moves from the lib
folder to the Referenced Libraries folder.
Copying the example Hello World connector requires that you create two classes, one for the connector and another for the connector type. A class is a construct for creating objects.
The connector type class identifies the connector to the Admin Console of a Google Search Appliance and provides properties that the connector manager uses to generate a connector instance. These classes will eventually become part of the .jar
file that the connector manager uses to create an instance of the connector and to identify the connector to the Admin Console.
To create the class for the connector:
src
element.Field Name | Enter This Value |
---|---|
Package | com.example.connector |
Name | HelloWorldConnector |
You can leave all other settings as they appear.
To create the class for the connector type:
HelloWorldConnectorType
and click Finish to close the window.By default, Eclipse compiles (builds) projects automatically. If a build does not occur after you insert the Hello World example, click Project > Build Automatically.
In this section, you add parameter files to your project, create a JAR file, copy the file to the web server, and test the connector.
Deploying the connector consists of:
In this section, you create the connector instance and type XML files, that when installed, enable the connector manager to create a connector instance and identify the connector to Admin Console.
To create the XML files:
src
folder in the HelloWorldConnector
project, click New > Folder, specify the Folder name as config
, and click Finish. config
folder, click New > File, enter the file name as connectorInstance.xml
, and click Finish. connectorType.xml
file in the config
folder using the contents of connectorType.xml.txt file.In this section, you create that JAR file that when installed on the Apache Tomcat server enables the connector manager and the Admin Console to access.
To create the .jar
file:
src
folder and click Export. The Export window opens. Expand Java, click JAR file, and click Next. The JAR Export window opens. src
and lib
folders. HelloWorldConnector.jar
and click Finish.In this section, you need to do essentially the same steps twice--starting and stopping Tomcat to copy the connector manager's .war
file to the Tomcat server, and then repeating the process for the connector's .jar
file. The reason is that the first procedure causes Tomcat to create new folders, and in the second procedure, you add the connector to the folders.
Before starting, return to the folder where you installed Tomcat as described in the previous section,
Installing Software Components. You will also need the connector-manager.war
file described in the previous section, Extracting the Connector SPI JAR File.
To deploy the connector manager on Tomcat:
%CATALINA_HOME%\bin
\shutdown
$CATALINA_HOME/bin
/shutdown.sh
.war
file to Tomcat.
connector-manager.war
file to the %CATALINA_HOME%\wepapps
folder.connector-manager.war
file to the $CATALINA_HOME/wepapps
folder.%CATALINA_HOME%\bin
\startup
$CATALINA_HOME/bin
/startup.sh
After Tomcat starts, verify that the server is running by clicking this link: http://127.0.0.1:8080/connector-manager/testConnectivity or by typing this address into a browser. This link verifies that the connector manager is running and displays the connector manager version, Java version, the operating system version, and the IP address of the search appliance.
To deploy your connector on Tomcat:
%CATALINA_HOME%\bin
\shutdown
$CATALINA_HOME/bin/shutdown.sh
.jar
file to Tomcat.
HelloWorldConnector.jar
file to the
%CATALINA_HOME%\webapps\connector-manager\WEB-INF\lib
folder.HelloWorldConnector.jar
file to the
$CATALINA_HOME/webapps/connector-manager/WEB-INF/lib
folder.Do not restart Tomcat yet. You restart it in the next section, Configuring the Admin Console.
The information that follows explains how to use the Admin Console of the Google Search Appliance to configure access to the connector manager and your connector.
You need the machine name where you installed Apache Tomcat (the servlet container), the domain name for your site, and the administrative user name and password for the Admin Console.
To configure the Admin Console for a connector:
%CATALINA_HOME%\bin
\startup
$CATALINA_HOME/bin
/startup.sh
After Tomcat starts, verify that the server is running by browsing to http://127.0.0.1:8080/connector-manager/testConnectivity. This link verifies that the connector manager is running and displays the connector manager version, Java version, and the operating system version.
^googleconnector://
Note: If this crawl pattern is not present, the search appliance rejects documents from a content feed connector, which this connector mimics. Metadata-and-URL connectors do not require the use of the googleconnector
crawl pattern.
Alternatively, you can list the IP address of the server where Apache Tomcat is installed in the Only trust feeds from these IP addresses field. Click Save Settings.
Field | Value |
---|---|
Manager name | test-cm |
Description | Test connector manager |
Location | Connector_Manager_URL |
For Connector_Manager_URL, The name of the computer on which you installed Apache Tomcat becomes the start of a URL that uses your local site's domain name, for example if the computer name is mycomputer
in the example.com
domain, the URL would be as follows: http://mycomputer.example.com:8080/connector-manager
When the Admin Console is able to access the connector manager on the network, a green dot appears next to the connector manager name in the Connector Manager Administration page and "New Connector Manager successfully added" displays.
You can test that the connector type is deployed by browsing to http://127.0.0.1:8080/connector-manager/getConnectorList, which displays the installed connector types.
The Connector Administration > Connectors page of the Admin Console appears.
The Connector Configuration page appears.
The configuration form page appears.
Your connector is now added to the Admin Console.
To test a search query:
hello
in the search field.
Hello world
appears as a document entry, but with a link that does not produce a result. This entry validates that the connector is working.
Congratulations! You just created a simple connector. You can examine the two sample Java files. They contain implementation classes for the connector SPI. You also learned how XML files are packaged as part of the .jar
file.
If you need a further example, you can build and deploy the test connector that comes with the connector manager, and which contains a mock repository. See Appendix A: Building a Debug Connector Manager for more information.
For further information about the code examples, see:
In this section, you can understand the source code for the example Hello World connector.
The following statements declare the Java package for the fictional Example connector, and declare the Google SPI interfaces and classes.
package com.example.connector; import com.google.enterprise.connector.spi.AuthenticationIdentity; import com.google.enterprise.connector.spi.AuthenticationManager; import com.google.enterprise.connector.spi.AuthenticationResponse; import com.google.enterprise.connector.spi.AuthorizationManager; import com.google.enterprise.connector.spi.AuthorizationResponse; import com.google.enterprise.connector.spi.Connector; import com.google.enterprise.connector.spi.Document; import com.google.enterprise.connector.spi.DocumentList; import com.google.enterprise.connector.spi.Property; import com.google.enterprise.connector.spi.RepositoryException; import com.google.enterprise.connector.spi.Session; import com.google.enterprise.connector.spi.SimpleDocument; import com.google.enterprise.connector.spi.SpiConstants; import com.google.enterprise.connector.spi.TraversalManager; import com.google.enterprise.connector.spi.Value;
The next statements import Java functionality into the connector. You can view the java.util classes at the Sun Java site.
import java.util.ArrayList; import java.util.Calendar; import java.util.Collection; import java.util.HashMap; import java.util.Iterator; import java.util.LinkedList; import java.util.List; import java.util.Map;
The next statement declares the connector's HelloWorldConnector
class as implementing the Connector
interface. This is the start of the connector application code.
public class HelloWorldConnector implements Connector {
The next statements declare the Session
interface and the HelloSession
class that implements the Session
interface. The HelloSession
class instantiates managers for traversing the content management system, and authenticating and authorizing users.
public Session login() { return new HelloSession(); } public class HelloSession implements Session { public AuthenticationManager getAuthenticationManager() { return new HelloAuthenticationManager(); } public AuthorizationManager getAuthorizationManager() { return new HelloAuthorizationManager(); } public TraversalManager getTraversalManager() { return new HelloTraversalManager(); } }
The HelloTraversalManager
implements the TraversalManager
interface and checks the setBatchHint
method, which is set by the connector manager. The setBatchHint
method is called by the connector manager to set the desired number of documents to return in each traversal batch.
public class HelloTraversalManager implements TraversalManager { private static final int MAX_DOCID = 1000; private int batchHint = 10; public void setBatchHint(int hint) { batchHint = hint; }
The connector manager calls the startTraversal
method to start traversing a content management system for the first time. This method calls the traverse utility method to build the list of documents.
public DocumentList startTraversal() { return traverse("0"); }
The connector manager calls the resumeTraversal
method to resume traversing a content management system as needed to acquire the documents for the list of documents to index. This method calls the traverse utility method to build the list of documents.
public DocumentList resumeTraversal(String checkpoint) { return traverse(checkpoint); }
The traverse
method returns batchHint
number of documents in each batch, until it returns a total of 1000 documents.
/** * Utility method to produce a {@code DocumentList} containing * the next batch of {@code Document} from the checkpoint. * * @param checkpoint a String representing the last document * number processed. */ private DocumentList traverse(String checkpoint) { int startDocId = Integer.parseInt(checkpoint) + 1; if (startDocId > MAX_DOCID) { return null; // No more documents. } Calendar cal = Calendar.getInstance(); List<Document> docList = new ArrayList<Document>(batchHint); int endDocId = Math.min(startDocId + batchHint - 1, MAX_DOCID); for (int i = startDocId; i <= endDocId; i++) { cal.setTimeInMillis(10 * 1000); // Each doc has the current timestamp Map<String, List<Value>>properties; properties = new HashMap<String, List<Value>>(); properties.put(SpiConstants.PROPNAME_DOCID, asList(Value.getStringValue(Integer.toString(i)))); properties.put(SpiConstants.PROPNAME_LASTMODIFIED, asList(Value.getDateValue(cal))); properties.put(SpiConstants.PROPNAME_DISPLAYURL, asList(Value.getStringValue("http://www.example.com/?docid=" + i))); properties.put(SpiConstants.PROPNAME_CONTENT, asList(Value.getBinaryValue("Hello World!".getBytes()))); docList.add(new SimpleDocument(properties)); } return new HelloWorldDocumentList(docList); } private List<Value> asList(Value value) { List<Value> list = new LinkedList<Value>(); list.add(value); return list; } }
The HelloWorldDocumentList
method iterates through the list of documents from the content management system.
class HelloWorldDocumentList implements DocumentList { private Iterator<Document> iterator; private Document document; public HelloWorldDocumentList(List<Document> documents) { this.iterator = documents.iterator(); this.document = null; } public Document nextDocument() { if (iterator.hasNext()) { document = iterator.next(); return document; } return null; } public String checkpoint() throws RepositoryException { if (document != null) { Property docId = document.findProperty(SpiConstants.PROPNAME_DOCID); return docId.nextValue().toString(); } return null; } }
The HelloAuthenticationManager
class implements the AuthenticationManager
interface and indicates that a potential user is successfully authenticated.
class HelloAuthenticationManager implements AuthenticationManager { public AuthenticationResponse authenticate(AuthenticationIdentity id) { return new AuthenticationResponse(true, null); } }
The HelloAuthorizationManager
class implements the AuthorizationManager
interface and indicates that the authenticated user is authorized to view all of the documents in the supplied Collection
. This set of statements also concludes the HelloWorldConnector
class.
class HelloAuthorizationManager implements AuthorizationManager { public Collection<AuthorizationResponse> authorizeDocids( Collection<String> docIds, AuthenticationIdentity id) { ArrayList<AuthorizationResponse> authorized = new ArrayList<AuthorizationResponse>(docIds.size()); for (String docId : docIds) { authorized.add(new AuthorizationResponse(true, docId)); } return authorized; } } } // end HelloWorldConnector class
The code for the example connector type creates a simple configuration form as a row in an XHTML table and demonstrates the use of the following methods:
getConfigForm
- Requests the configuration form for the connector. The form displays on the Admin Console so that an administrator can provide connector settings.validateConfig
- Validates the information that the administrator entered in the configuration form.getPopulatedConfigForm
- Provides a configuration form when an administrator edits the settings for a connector.The example connector type code is as follows:
package com.example.connector; import com.google.enterprise.connector.spi.ConfigureResponse; import com.google.enterprise.connector.spi.ConnectorFactory; import com.google.enterprise.connector.spi.ConnectorType; import java.util.Locale; import java.util.Map; public class HelloWorldConnectorType implements ConnectorType { String form = "<tr><td>fake form</td><td><input type=\"text\" name=\"aa\"/></td></tr>"; public ConfigureResponse getConfigForm(Locale locale) { return new ConfigureResponse(null, form); } public ConfigureResponse validateConfig(Map<String, String> config, Locale locale, ConnectorFactory factory) { return null; } public ConfigureResponse getPopulatedConfigForm(Map<String, String> config, Locale locale) { return new ConfigureResponse("filled", form); } }
The connectorType.xml
file provides parameters for the ConnectorType
object.
You may choose to expose the configuration form fields here or not as required by your design.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring- beans.dtd"> <beans> <bean id="helloworld-connector" class="com.example.connector.HelloWorldConnectorType"> </bean> </beans>
The connectorInstance.xml file enables administrators to set parameters for use in deploying a connector.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"> <beans> <bean id="helloworld-connector" class="com.example.connector.HelloWorldConnector"> </bean> </beans>
Previous Chapter: Introduction
Next Chapter: SPI Overview