Easily provide repository data to a Google Search Appliance (GSA).

See:
          Description

Packages
com.google.enterprise.adaptor Adaptor interfaces and implementation.
com.google.enterprise.adaptor.examples  
com.google.enterprise.adaptor.experimental  
com.google.enterprise.adaptor.prebuilt  

 

Easily provide repository data to a Google Search Appliance (GSA).

Note: If instead of Java you'd like to use another language take a look at CommandLineAdaptor.

Table Of Contents

Basic GSA Setup

  1. Add the IP address of the computer that hosts the adaptor to the List of Trusted IP Addresses on the GSA.

    In the GSA's Admin Console, go to Content Sources > Feeds, and scroll down to List of Trusted IP Addresses. Add the IP address for the adaptor to the list.

  2. Add the URLs provided by the adaptor to the Follow Patterns on the GSA.

    In the Admin console, go to Content Sources > Web Crawl > Start and Block URLs , and scroll down to Follow Patterns. Add an entry like hostname:port/ where hostname is the hostname of the machine that hosts the adaptor and port defaults to 5678 (read on to change port number).

Running the Adaptor Template, as an initial test

  1. You should have already installed JDK 6 or higher and gotten a plexi release (download from https://code.google.com/p/plexi/). From the downloaded release zip file, use the extracted adaptor jar (eg: adaptor-20130612-withlib.jar) and extracted adaptor examples jar (eg: examples/adaptor-20130612-examples.jar). If instead of working from a release you are working from source code you can build the required jars by running:
    ant dist
    cd dist

    The needed jars will be in a zip file within the current directory (eg: adaptor-20130612-bin.zip will have adaptor-20130612-withlib.jar and examples/adaptor-20130612-examples.jar).

  2. Create an adaptor-config.properties text file in the current directory that looks like:
    gsa.hostname=mygsahostname

    You should replace mygsahostname with the hostname or IP of your GSA. This file allows you to do other configuration of the adaptor library like changing the server port and feed name:

    gsa.hostname=mygsahostname
    server.port=6677
    feed.name=mydocfeedtogsa

    Later, if you have trouble with the adaptor library incorrectly auto-detecting your computer's hostname, then you may need to add a line like:

    server.hostname=yourcomputershostname

    For a list and explanation of available configruation options view Config.

  3. Start the Adaptor Template. Note that the jar files you have may have a different date in their names. For Windows:
    java -cp adaptor-20130612-withlib.jar;examples/adaptor-20130612-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate
    For all other OSes:
    java -cp adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate
  4. Ensure crawling is enabled on your GSA.

    Go to Content Sources > Diagnostics > Crawl Status and click Resume Crawl if crawling system is currently paused.

  5. Confirm things ran successfully.

    In the GSA, go to Contents Sources > Feeds. In the Current Feeds section, you should see an entry for a "adaptor_HOSTNAME_PORT" (which can be changed by setting the feed.name configuration variable).

    In the adaptor log look to see document ids being pushed and requests for document contents being served.

Creating your own Adaptor

  1. Review JavaDoc for Adaptor and AbstractAdaptor.
  2. From the zip file (eg:adaptor-20130612-src.zip), make a copy of src/com/google/enterprise/adaptor/examples/AdaptorTemplate.java to your own package and name. You will need to modify the contents appropriately for the new package and name.
  3. Compile, run, and verify the copied adaptor using your favorite IDE. You will only need adaptor-20130612-withlib.jar in your classpath. Note that the date may be different.
  4. Modify it further for your own repository.
  5. Declare success for getting content from your custom repository to the GSA.

Testing Tip

An adaptor, by default, will deny all document accesses, except from the GSA. To allow debugging and testing an adaptor without a GSA, you can add a hostname to the server.fullAccessHosts config key to allow that computer full access to all adaptor content. In addition, this setting allows that computer to see metadata and other GSA-specific information as HTTP headers. This can be very useful when combined with Firebug or the Web Inspector in your browser to observe an Adaptor's behavior.

Admin Tip

You can set configuration variables on the command line instead of in adaptor-config.properties. You are allowed multiple arguments of the form "-Dconfigkey=configvalue". When providing a value on the command line, it overrides the default value and the value (if any) in the configuration file. For example:

java -cp adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar 
    com.google.enterprise.adaptor.examples.AdaptorTemplate -Dgsa.hostname=mygsahostname 
    -Dserver.port=6677

Running as a Windows Service

Download and extract prunsrv.exe from the latest Windows binary download of Apache Commons Daemon. If you are running on 64-bit Windows and will use a 64-bit JVM, then you should use the prunsrv.exe in the amd64/ directory. Place prunsrv.exe in the same directory of the Adaptor you would like to run as a service.

You can then register the service:

prunsrv install someadaptor --StartPath="%CD%" ^
  --Classpath=someadaptor-withlib.jar ^
  --StartMode=jvm --StartClass=com.google.enterprise.adaptor.Daemon ^
  --StartMethod=serviceStart --StartParams=package.SomeAdaptor
  --StopMode=jvm --StopClass=com.google.enterprise.adaptor.Daemon ^
  --StopMethod=serviceStop --StdOutput=stdout.log --StdError=stderr.log ^
  ++JvmOptions=-Djava.util.logging.config.file=logging.properties ^
  --Startup=auto

Where someadaptor is a unique, arbitrary service name.

To start the service, use the Windows service management tool or run:

prunsrv start someadaptor

Where someadaptor is the same service name used during registration.

Enabling Security

Security is not enabled by default because it requires a reasonable amount of setup, on both the GSA and adaptor. The GSA needs a valid certificate for the hostname you are accessing it with (gsa.hostname). Thus, the default one it ships with cannot be valid and you need to generate a new one. Setting up security is required before users can access non-public documents directly from the adaptor.

Creating Self-Signed Certificates

In the GSA's Admin Console, go to Administration > SSL Settings. Under the Create a New SSL Certificate heading change Host Name to GSA's hostname written exactly as the adaptor will use. Then click Create Self-Signed Certificate and wait for the operation to complete. Then click Install SSL Certificate and wait for that operation to complete (about 1 minute). You now have a valid self-signed certificate, but it is not available to be trusted by the adaptor.

You need to get the GSA's freshly-created certificate to add it as a trusted host for the adaptor:

Now you should generate a self-signed certificate for the adaptor and export the newly created certificate. Within the adaptor's directory, you should run:

keytool -genkeypair -keystore keys.jks -storepass changeit -keypass changeit -alias adaptor -keyalg RSA -validity 365

For "What is your first and last name?", you should enter the hostname of the adaptor's computer. You are free to answer the other questions however you wish (including not answering them). When you are happy with your answers, answer "yes" to "Is CN=yourcomputershostname, OU=... correct?"

Then, still in adaptor's directory, you should run:

keytool -exportcert -alias adaptor -keystore keys.jks -storepass changeit -keypass changeit -rfc -file adaptor.crt

Copy cacerts from Java to the adaptor's directory. For Windows:

copy PATH\TO\JRE\lib\security\cacerts cacerts.jks

For all other OSes:

cp PATH/TO/JRE/lib/security/cacerts cacerts.jks

To allow the adaptor to trust itself, execute:

keytool -importcert -keystore cacerts.jks -storepass changeit -file adaptor.crt -alias adaptor

Answer "yes" to "Trust this certificate?"

Exchanging Certificates

To allow the adaptor to trust the GSA, execute:

keytool -importcert -keystore cacerts.jks -storepass changeit -file gsa.crt -alias gsa

Answer "yes" to "Trust this certificate?"

To allow the GSA to trust the adaptor, within the GSA's Admin Console, go to Administration > Certificate Authorities. Click the Choose File button (this button could be called "Browse...") under the Add more Cerificate Authorities heading. Choose "adaptor.crt" in the adaptor's directory and click Save Settings.

Flipping the Switch

Now that everything is prepared, you can flip the security switch with the adaptor by adding a line to your adaptor-config.properties:

server.secure=true

The adaptor can now use the GSA's authentication configuration and will use HTTPS for all communication.

Example command line to run secure:

    java \
    -Djava.util.logging.config.file=src/logging.properties \
    -Djavax.net.ssl.keyStore=keys.jks \
    -Djavax.net.ssl.keyStoreType=jks \
    -Djavax.net.ssl.keyStorePassword=changeit \
    -Djavax.net.ssl.trustStore=cacerts.jks \
    -Djavax.net.ssl.trustStoreType=jks \
    -Djavax.net.ssl.trustStorePassword=changeit \
    -classpath 'adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar' \
    com.google.enterprise.adaptor.examples.AdaptorWithCrawlTimeMetadataTemplate
  

Enable Stricter Security (optional)

There are additional security options you can control on the GSA. You may want to try running an adaptor with server.secure set before enabling these stricter features. Within the GSA's Admin Console, go to Administration > SSL Settings. There you can:

Click Save Setup to save your changes.

Note: By using these settings you improve security, but also require all adaptors to be configured for security and have server.secure=true in their configuration.