Google Search Appliance - Connector Developer's Guide: Getting Started

Google Search Appliance software version 6.2
Connector manager version 2.4.0
Posted December 2009

This section provides a tutorial that enables you to create a simple Java connector to connect to a simulated content management system, and to see how connector development works. This connector uses Java and Eclipse freeware and enables you to follow along step-by-step. Eclipse is a commonly used tool for Java programming. If you have another integrated development environment (IDE), we recommend that you use Eclipse for this tutorial so that you can follow each step. The use of an IDE in this tutorial simplifies the development procedures and enables you to develop a connector in a production programming environment.

For connector terminology definitions, see the Google Enterprise Glossary.

Chapters: About This Guide, Introduction, Getting Started, SPI Overview, Traversing Documents,
Authentication, Authorization, Configuration, Appendix A: Building a Debug Connector Manager

Chapter Contents: Getting Started

  1. Writing a Hello World Connector
    1. Tutorial Prerequisites
    2. Tutorial Overview
    3. Setting Up the Connector Development Environment
      1. Installing Software Components
      2. Downloading the Connector Manager Software
      3. Extracting the Connector SPI JAR File
    4. Creating a Java Project in Eclipse
    5. Adding the Connector Manager JAR file to the Eclipse Project
    6. Copying the Hello World Connector to Eclipse
    7. Compiling the Connector in Eclipse
  2. Completing and Testing Your Connector
    1. Creating the XML Files
    2. Creating the Hello World Connector JAR File
    3. Deploying the Connector Manager and Connector on Apache Tomcat
      1. Deploying the Connector Manager
      2. Deploying Your Connector
    4. Configuring the Admin Console
    5. Testing a Search Query
  3. Summary
  4. Example Connector Code
    1. Package and Import Statements
    2. Declaring the Connector and Session
    3. Traversing the Example Content Management System
      1. Example startTraversal Method
      2. Example resumeTraversal Method
      3. Traversal Utility Method
      4. HelloWorldDocumentList Method
      5. HelloAuthenticationManager Class
      6. HelloAuthorizationManager Class
  5. Example Connector Type Code
  6. Example connectorType.xml Code
  7. Example connectorInstance.xml Code

Writing a Hello World Connector

This section shows you how to write a simple "Hello World" connector. To simplify the code, the connector does not use a real content management system. Instead, the connector repeatedly provides the same document to the search appliance and uses a simulated URL for the document. The contents of the document is “Hello World."

Tutorial Prerequisites

To complete this tutorial, you need:

The sample files that you use in this tutorial are available in the current documentation folder:

Back to top

Tutorial Overview

In this tutorial, you will complete the following activities:

  1. Setting Up the Connector Development Environment
  2. Creating a Java Project in Eclipse
  3. Adding the Connector Manager JAR file to the Eclipse Project
  4. Coping the Hello World Connector to Eclipse
  5. Compiling the Connector in Eclipse
  6. Completing and Testing Your Connector
  7. Configuring the Admin Console for the Connector
  8. Testing a Search Query

Setting Up the Connector Development Environment

In this section, you set up the connector development environment by performing the following activities:

Installing Software Components

In this section, you install the Java Development Kit (JDK), Eclipse, and Apache Tomcat (the servlet container).

To set up the connector development environment:

  1. Install the JDK version 1.5 or later.

    The JDK is available from http://java.sun.com. The connector framework supports any version of the JDK.

  2. Install Eclipse.

    Eclipse is available from http://www.eclipse.org. You can develop connectors using any integrated development environment (IDE) such as Eclipse. For this tutorial, you need Eclipse to complete the tasks. (The procedures use Eclipse version 3.3.2.)

  3. Install Apache Tomcat:
    1. Download an Apache Tomcat ZIP or TAR file from Apache.org and install on a folder on your computer. (The executable version for Windows does not provide the startup and shutdown commands described in this document.)
    2. Create an environmental variable for CATALINA_HOME.
      Windows XP:
      1. Right click My Computer and click Properties.
      2. Click the Advanced tab and click Environment Variables. The Environment Variables menu appears.
      3. In System variables, click New. The New System Variable menu appears.
      4. Type CATALINA_HOME for the Variable name and the enter the path to the Tomcat download folder, for example, c:\apache-tomcat-6.0.18\ as the Variable value.
      5. Click OK to complete the variable, OK to exit Environment Variables, and OK to exit System Properties.

      Linux and Macintosh:
      1. See if you need to set the variable, by typing:
        echo $CATALINA_HOME

        If this command doesn't display a value, continue to Step b.
      2. Set the variable (substitute your Tomcat version number in the commands that follow).
        For example, from the command line:
        env CATALINA_HOME=/usr/local/apps/apache-tomcat-6.0.18
        Or from your ~/.bashrc file:
        export CATALINA_HOME=/usr/local/apps/apache-tomcat-6.0.18

    3. Examine the installed Tomcat file structure and locate the bin and webapps folders to be used in Deploying the Connector Manager on Apache Tomcat.
    4. Start Tomcat:
      • Windows: Open a command prompt and type:
        %CATALINA_HOME%\bin
        \startup
      • Linux or Macintosh: Start a Terminal and type:
        $CATALINA_HOME/bin/startup.sh

Back to top

Downloading the Connector Manager Software

In this section, you download the open source connector manager software:

  1. Choose a connector manager download file from http://code.google.com/p/google-enterprise-connector-manager/.

    In the Featured Downloads section, choose which compressed file to download:

    Platform Binary File
    Linux or Macintosh connector-manager-2.4.0.tar.gz
    Windows connector-manager-2.4.0.zip
  2. Download the file for your platform and open the compressed file.

Extracting the Connector SPI JAR File

In this section, after opening the connector manager compressed file from the previous section, you need to extract the connector manager .war file, copy it, and extract the .jar file for the SPI. The service provider interface .jar file provides classes for each interface that is coded in the example Hello World connector file that you can create in this tutorial.

The instructions that follow are for a Windows system, if you are using a Linux or Macintosh, refer to your operating system documentation for how to open a .tar file, and how to extract, copy, and rename files.

To extract the connector-spi.jar file:

  1. Open the folder labeled connector-manager-2.4.0.
  2. Copy or extract the connector-manager.war file and store it in a folder on your computer.
  3. Copy the connector-manager.war file so that you have two copies of the same file.
  4. Rename one of the copies of the connector-manager.war file to be connector-manager.zip.
  5. Double-click the connector-manager.zip file, open the WEB-INF folder, and open the lib folder.
  6. Copy or extract the connector-spi.jar file to the current folder on your computer.

    You will need to use the connector-spi.jar file next the later section, Adding the Connector Manager JAR file to the Eclipse Project.

Back to top

Creating a Java Project in Eclipse

In this section, create a new project in Eclipse for use in this tutorial.

To create a Java project in Eclipse:

  1. Start Eclipse.

    Locate the folder where you installed the Eclipse software:

  2. Create a new project.

    Click File > New > Project > Java Project and click Next. The New Java Project window appears.

    If you have Java perspective set, you can create the project with File > New > Java Project.

  3. Name the project.

    Enter the HelloWorldConnector project name. You can leave the default settings as they appear.

  4. Click Finish.

    The new project is created and saved in the default workspace.

  5. View Package Explorer (if the Package Explorer is not already open).

    Click Window > Show View > Package Explorer.

  6. Create a lib folder.
    1. Right click the HelloWorldConnector project name in Package Explorer and click New > Folder.
    2. Type in the Folder Name as lib and click Finish.

Back to top

Adding the Connector Manager JAR file to the Eclipse Project

In this section, add the connector manager's SPI .jar file to Eclipse's working folders.

To add the .jar file to the lib folder that you created in Step 6 of Creating a Java Project in Eclipse:

  1. Copy the connector-spi.jar file.

    Return to the folder where the connector-spi.jar file resides (as described in Step 6 of Extracting the Connector SPI JAR File) and copy the file to the clipboard.

  2. Click the lib folder.
  3. Paste the connector-spi.jar file.

    In Eclipse, right click the lib folder and click Paste to put the connector-spi.jar file in the lib folder.

To register the connector-spi.jar file with Eclipse:

  1. Refresh the folder.

    Right click the lib folder and click Refresh from the menu. The connector-spi.jar entry appears.

  2. Add to the build path.

    (A build path indicates to Eclipse how to find the code for compiling.) Right click the connector-spi.jar file and click Build Path > Add to Build Path.

    Note: The connector-spi.jar file moves from the lib folder to the Referenced Libraries folder.

Back to top

Copying the Hello World Connector to Eclipse

Copying the example Hello World connector requires that you create two classes, one for the connector and another for the connector type. A class is a construct for creating objects.

The connector type class identifies the connector to the Admin Console of a Google Search Appliance and provides properties that the connector manager uses to generate a connector instance. These classes will eventually become part of the .jar file that the connector manager uses to create an instance of the connector and to identify the connector to the Admin Console.

To create the class for the connector:

  1. In the Package Explorer, expand the HelloWorldConnector element and right click the src element.
  2. Click New > Class. The New Java Class window opens.
  3. Type in the following information in this window.
    Field NameEnter This Value
    Packagecom.example.connector
    NameHelloWorldConnector

    You can leave all other settings as they appear.

  4. Open the HelloWorldConnector.java.txt source code file, select all text, copy the text, delete the existing text from the Eclipse source window, and paste the text in to the source window.
    If you want to learn more about the purpose of the source code, see Examining the HelloWorldConnector Source Code.
  5. Click File > Save.

To create the class for the connector type:

  1. In the Package Explorer, expand the HelloWorldConnector element and right click the com.example.connector element.
  2. Click New > Class. The New Java Class window opens.
  3. Type in the Name as HelloWorldConnectorType and click Finish to close the window.
  4. Open the HelloWorldConnectorType.java.txt source code file, select all text, copy the text, delete the existing text from the Eclipse source window, and paste the text in to the source window.
  5. Click File > Save.

Back to top

Compiling the Connector in Eclipse

By default, Eclipse compiles (builds) projects automatically. If a build does not occur after you insert the Hello World example, click Project > Build Automatically.

Completing and Testing Your Connector

In this section, you add parameter files to your project, create a JAR file, copy the file to the web server, and test the connector.

Deploying the connector consists of:

  1. Creating the XML Files
  2. Creating the Hello World Connector JAR File
  3. Deploying the Connector Manager and Connector on Apache Tomcat
  4. Configuring the Admin Console
  5. Testing a Search Query

Creating the XML Files

In this section, you create the connector instance and type XML files, that when installed, enable the connector manager to create a connector instance and identify the connector to Admin Console.

To create the XML files:

  1. In Eclipse, right click the src folder in the HelloWorldConnector project, click New > Folder, specify the Folder name as config, and click Finish.
  2. Right click the config folder, click New > File, enter the file name as connectorInstance.xml, and click Finish.
  3. Click the Source tab at the bottom of the connectorInstance.xml panel.
  4. Open connectorInstance.xml.txt, copy the contents, and paste the text into the connectorInstance.xml panel.
  5. Click File > Save.
  6. Repeat Steps 2 to 5 to create the connectorType.xml file in the config folder using the contents of connectorType.xml.txt file.

Creating the Hello World Connector JAR File

In this section, you create that JAR file that when installed on the Apache Tomcat server enables the connector manager and the Admin Console to access.

To create the .jar file:

  1. Right click on the src folder and click Export. The Export window opens. Expand Java, click JAR file, and click Next. The JAR Export window opens.
  2. For Select the resources to export, expand the HelloWorldConnector item and click the checkboxes for the src and lib folders.
  3. For Select the export destination, type in the JAR file name as HelloWorldConnector.jar and click Finish.

Back to top

Deploying the Connector Manager and Connector on Apache Tomcat

In this section, you need to do essentially the same steps twice--starting and stopping Tomcat to copy the connector manager's .war file to the Tomcat server, and then repeating the process for the connector's .jar file. The reason is that the first procedure causes Tomcat to create new folders, and in the second procedure, you add the connector to the folders.

Before starting, return to the folder where you installed Tomcat as described in the previous section, Installing Software Components. You will also need the connector-manager.war file described in the previous section, Extracting the Connector SPI JAR File.

Deploying the Connector Manager

To deploy the connector manager on Tomcat:

  1. Shutdown Tomcat.
  2. Copy the connector manager .war file to Tomcat.
  3. Start Tomcat.
  4. Test Tomcat.

    After Tomcat starts, verify that the server is running by clicking this link: http://127.0.0.1:8080/connector-manager/testConnectivity or by typing this address into a browser. This link verifies that the connector manager is running and displays the connector manager version, Java version, the operating system version, and the IP address of the search appliance.

Back to top

Deploying Your Connector

To deploy your connector on Tomcat:

  1. Shutdown Tomcat.
  2. Copy the connector .jar file to Tomcat.

Do not restart Tomcat yet. You restart it in the next section, Configuring the Admin Console.

Back to top

Configuring the Admin Console

The information that follows explains how to use the Admin Console of the Google Search Appliance to configure access to the connector manager and your connector.

You need the machine name where you installed Apache Tomcat (the servlet container), the domain name for your site, and the administrative user name and password for the Admin Console.

To configure the Admin Console for a connector:

  1. Start Tomcat:
  2. Test Tomcat.

    After Tomcat starts, verify that the server is running by browsing to http://127.0.0.1:8080/connector-manager/testConnectivity. This link verifies that the connector manager is running and displays the connector manager version, Java version, and the operating system version.

  3. Log into an Admin Console as an administrator.
  4. In the Crawl and Index > Crawl URLs page of the Admin Console, add the following crawl pattern to the Follow and Crawl Only URLs with the Following Patterns edit box:
    ^googleconnector://

    Note: If this crawl pattern is not present, the search appliance rejects documents from a content feed connector, which this connector mimics. Metadata-and-URL connectors do not require the use of the googleconnector crawl pattern.

  5. Click Save URLs to Crawl.
  6. On the Crawl and Index > Feeds > List of Trusted IP Addresses page, either list the IP address of the servlet container in the Only trust feeds from these IP addresses field or click Trust feeds from all IP addresses.

    Alternatively, you can list the IP address of the server where Apache Tomcat is installed in the Only trust feeds from these IP addresses field. Click Save Settings.

  7. Click Connector Administration > Connector Managers and enter the following values in Register a New Connector Manager:
    Field Value
    Manager nametest-cm
    DescriptionTest connector manager
    Location

    Connector_Manager_URL


    For Connector_Manager_URL, The name of the computer on which you installed Apache Tomcat becomes the start of a URL that uses your local site's domain name, for example if the computer name is mycomputer in the example.com domain, the URL would be as follows: http://mycomputer.example.com:8080/connector-manager

  8. Click Save. The Admin Console tests the connectivity between the search appliance and the connector manager.

    When the Admin Console is able to access the connector manager on the network, a green dot appears next to the connector manager name in the Connector Manager Administration page and "New Connector Manager successfully added" displays.

    You can test that the connector type is deployed by browsing to http://127.0.0.1:8080/connector-manager/getConnectorList, which displays the installed connector types.

  9. Click the name of the test-cm connector manager.

    The Connector Administration > Connectors page of the Admin Console appears.

  10. From the Connector manager pull-down menu, choose the test-cm name of the connector manager and click Add New Connector.

    The Connector Configuration page appears.

  11. Enter the helloworld-connector name and click Get Configuration Form.

    The configuration form page appears.

  12. Enter test values (none of the values mean anything) and click Save Configuration.

    Your connector is now added to the Admin Console.

Back to top

Testing a Search Query

To test a search query:

  1. Wait a few minutes while the search appliance indexes the connector document.
  2. From the Admin Console, click Test Center to start a search.
  3. In the Test Center, set the For Use Frontend pull-down to default_frontend and set the Search over Collection pull-down to default_collection. Click View Output.
  4. Ensure that Search: public content is set and enter hello in the search field.

    Hello world appears as a document entry, but with a link that does not produce a result. This entry validates that the connector is working.

Summary

Congratulations! You just created a simple connector. You can examine the two sample Java files. They contain implementation classes for the connector SPI. You also learned how XML files are packaged as part of the .jar file.

If you need a further example, you can build and deploy the test connector that comes with the connector manager, and which contains a mock repository. See Appendix A: Building a Debug Connector Manager for more information.

For further information about the code examples, see:

 

Back to top

Example Connector Code

In this section, you can understand the source code for the example Hello World connector.

Package and Import Statements

The following statements declare the Java package for the fictional Example connector, and declare the Google SPI interfaces and classes.

package com.example.connector;
import com.google.enterprise.connector.spi.AuthenticationIdentity;
import com.google.enterprise.connector.spi.AuthenticationManager;
import com.google.enterprise.connector.spi.AuthenticationResponse;
import com.google.enterprise.connector.spi.AuthorizationManager;
import com.google.enterprise.connector.spi.AuthorizationResponse;
import com.google.enterprise.connector.spi.Connector;
import com.google.enterprise.connector.spi.Document;
import com.google.enterprise.connector.spi.DocumentList;
import com.google.enterprise.connector.spi.Property;
import com.google.enterprise.connector.spi.RepositoryException;
import com.google.enterprise.connector.spi.Session;
import com.google.enterprise.connector.spi.SimpleDocument;
import com.google.enterprise.connector.spi.SpiConstants;
import com.google.enterprise.connector.spi.TraversalManager;
import com.google.enterprise.connector.spi.Value;

The next statements import Java functionality into the connector. You can view the java.util classes at the Sun Java site.

import java.util.ArrayList;
import java.util.Calendar;
import java.util.Collection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

Back to top

Declaring the Connector and Session

The next statement declares the connector's HelloWorldConnector class as implementing the Connector interface. This is the start of the connector application code.

public class HelloWorldConnector implements Connector {

The next statements declare the Session interface and the HelloSession class that implements the Session interface. The HelloSession class instantiates managers for traversing the content management system, and authenticating and authorizing users.

public Session login() {
  return new HelloSession();
}
public class HelloSession implements Session {
  public AuthenticationManager getAuthenticationManager() {
    return new HelloAuthenticationManager();
  }
  public AuthorizationManager getAuthorizationManager() {
    return new HelloAuthorizationManager();
  }
  public TraversalManager getTraversalManager() {
    return new HelloTraversalManager();
  }
}

Traversing the Example Content Management System

The HelloTraversalManager implements the TraversalManager interface and checks the setBatchHint method, which is set by the connector manager. The setBatchHint method is called by the connector manager to set the desired number of documents to return in each traversal batch.

public class HelloTraversalManager implements TraversalManager {
  private static final int MAX_DOCID = 1000;
  private int batchHint = 10;
  public void setBatchHint(int hint) {
    batchHint = hint;
  }

Back to top

Example startTraversal Method

The connector manager calls the startTraversal method to start traversing a content management system for the first time. This method calls the traverse utility method to build the list of documents.

public DocumentList startTraversal() {
  return traverse("0");
}

Example resumeTraversal Method

The connector manager calls the resumeTraversal method to resume traversing a content management system as needed to acquire the documents for the list of documents to index. This method calls the traverse utility method to build the list of documents.

public DocumentList resumeTraversal(String checkpoint) {
  return traverse(checkpoint);
}

Back to top

Traversal Utility Method

The traverse method returns batchHint number of documents in each batch, until it returns a total of 1000 documents.

/**
 * Utility method to produce a {@code DocumentList} containing
 * the next batch of {@code Document} from the checkpoint.
 *
 * @param checkpoint a String representing the last document
 *        number processed.
 */
private DocumentList traverse(String checkpoint) {
  int startDocId = Integer.parseInt(checkpoint) + 1;
  if (startDocId > MAX_DOCID) {
    return null;  // No more documents.
  }
  Calendar cal = Calendar.getInstance();
  List<Document> docList = new ArrayList<Document>(batchHint);
  int endDocId = Math.min(startDocId + batchHint - 1, MAX_DOCID);
  for (int i = startDocId; i <= endDocId; i++) {
    cal.setTimeInMillis(10 * 1000); // Each doc has the current timestamp
    Map<String, List<Value>>properties;
    properties = new HashMap<String, List<Value>>();
    properties.put(SpiConstants.PROPNAME_DOCID,
      asList(Value.getStringValue(Integer.toString(i))));
    properties.put(SpiConstants.PROPNAME_LASTMODIFIED,
      asList(Value.getDateValue(cal)));
    properties.put(SpiConstants.PROPNAME_DISPLAYURL,
      asList(Value.getStringValue("http://www.example.com/?docid=" + i)));
    properties.put(SpiConstants.PROPNAME_CONTENT,
      asList(Value.getBinaryValue("Hello World!".getBytes())));
    docList.add(new SimpleDocument(properties));
  }
    return new HelloWorldDocumentList(docList);
  }
  private List<Value> asList(Value value) {
    List<Value> list = new LinkedList<Value>();
    list.add(value);
    return list;
  }
}

Back to top

HelloWorldDocumentList Method

The HelloWorldDocumentList method iterates through the list of documents from the content management system.

class HelloWorldDocumentList implements DocumentList {
  private Iterator<Document> iterator;
  private Document document;
  public HelloWorldDocumentList(List<Document> documents) {
    this.iterator = documents.iterator();
    this.document = null;
  }
  public Document nextDocument() {
    if (iterator.hasNext()) {
      document = iterator.next();
      return document;
    }
    return null;
  }
  public String checkpoint() throws RepositoryException {
    if (document != null) {
      Property docId =
        document.findProperty(SpiConstants.PROPNAME_DOCID);
      return docId.nextValue().toString();
    }
    return null;
  }
}

HelloAuthenticationManager Class

The HelloAuthenticationManager class implements the AuthenticationManager interface and indicates that a potential user is successfully authenticated.

class HelloAuthenticationManager implements AuthenticationManager {
  public AuthenticationResponse authenticate(AuthenticationIdentity id) {
    return new AuthenticationResponse(true, null);
  }
}

Back to top

HelloAuthorizationManager Class

The HelloAuthorizationManager class implements the AuthorizationManager interface and indicates that the authenticated user is authorized to view all of the documents in the supplied Collection. This set of statements also concludes the HelloWorldConnector class.

class HelloAuthorizationManager implements AuthorizationManager {
  public Collection<AuthorizationResponse> authorizeDocids(
    Collection<String> docIds, AuthenticationIdentity id) {
      ArrayList<AuthorizationResponse> authorized =
        new ArrayList<AuthorizationResponse>(docIds.size());
      for (String docId : docIds) {
        authorized.add(new AuthorizationResponse(true, docId));
      }
      return authorized;
  }
 } 
}    // end HelloWorldConnector class

Back to top

Example Connector Type Code

The code for the example connector type creates a simple configuration form as a row in an XHTML table and demonstrates the use of the following methods:

The example connector type code is as follows:

package com.example.connector;

import com.google.enterprise.connector.spi.ConfigureResponse;
import com.google.enterprise.connector.spi.ConnectorFactory;
import com.google.enterprise.connector.spi.ConnectorType;

import java.util.Locale;
import java.util.Map;

public class HelloWorldConnectorType implements ConnectorType {
  String form = "<tr><td>fake form</td><td><input type=\"text\" name=\"aa\"/></td></tr>";

  public ConfigureResponse getConfigForm(Locale locale) {
    return new ConfigureResponse(null, form);
  }

  public ConfigureResponse validateConfig(Map<String, String> config, Locale locale,
      ConnectorFactory factory) {
    return null;
  }

  public ConfigureResponse getPopulatedConfigForm(Map<String, String> config, Locale locale) {
    return new ConfigureResponse("filled", form);
  }
}

Back to top

Example connectorType.xml Code

The connectorType.xml file provides parameters for the ConnectorType object. You may choose to expose the configuration form fields here or not as required by your design.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring- beans.dtd">
<beans>
  <bean id="helloworld-connector"
    class="com.example.connector.HelloWorldConnectorType">
  </bean>
</beans>

Example connectorInstance.xml Code

The connectorInstance.xml file enables administrators to set parameters for use in deploying a connector.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>
  <bean id="helloworld-connector"
    class="com.example.connector.HelloWorldConnector">
  </bean>
</beans>

Back to top

Previous Chapter: Introduction
Next Chapter: SPI Overview