Indexing and Querying Data

Indexing and Querying Data

The Couchbase Elasticsearch plug-in uses the cross data center replication (XDCR) feature in Couchbase Server. This feature can transmit all documents from a Couchbase data bucket or server cluster to another cluster. In this case you transmit documents from Couchbase to Elasticsearch by using XDCR. As soon and these documents have been transmitted, the Elasticsearch engine indexes them.

Setting up XDCR Replication to Elasticsearch

To set up XDCR replication from Couchbase Server to Elasticsearch:

  1. Open the Couchbase web console and log in.
  2. Click on the XDCR tab.

    Under this tab you can configure and start data replication between a source and destination cluster. In this case the source cluster is a Couchbase cluster and the destination cluster is Elasticsearch.

  3. Click on Create Cluster Reference.

    A panel appears where you can specify information for your Elasticsearch cluster. This is the Elasticsearch cluster where Couchbase Server will send copies of documents from a data bucket to be indexed.

  4. Enter a name, host name, user name and password for your Elasticsearch cluster then click Save.

    Be aware that Elasticsearch listens on port 9091, which is not a standard port for those familiar with Couchbase Server.

    The reference to the new replication will appear in the Remote Clusters list under the XDCR Tab.

  5. To set up replication, click Create Replication.

    A panel appears where you can establish replication from your Couchbase cluster to Elasticsearch.

  6. Under Replicate changes from: Bucket, choose beer-sample.
  7. Under the section To: select Elasticsearch.
  8. For Bucket: enter beer-sample. This is actually the Elasticsearch index where the data will be sent for indexing.
  9. If you are using Couchbase Server 2.2 or later, click Advanced settings and change the XDCR Protocol setting to Version 1.
  10. Finally click Replicate to start replication of documents to Elasticsearch. Couchbase Server will begin sending data from the beer-sample bucket to your Elasticsearch cluster.

    Under the Ongoing Replications section, you will see the replication and status of replication.

  11. You can also view the data transfer by clicking the Overview tab of Elasticsearch head:

    The docs field indicates the number of items that have been indexed by Elasticsearch. At this point you can begin querying data from Elasticsearch.

The number of documents displayed by Elasticsearch head may be greater than the actual number of documents in Couchbase Server. This is because XDCR and the Couchbase Plug-in for Elasticsearch will also send additional documents that describe the status of replication and Elasticsearch head will show this total number. There is an alternate, more accurate way you can determine the true number of documents indexed by Elasticsearch, which excludes extra status documents. You can use this method to debug possible data transfer issues between Couchbase and Elasticsearch.

Querying Data

To issue a query to Elasticsearch, you send a request in the form of a simple Lucene-based string or you can use the more extensive JSON-based query syntax, DSL. When you query Elasticsearch, you send it as an HTTP request using any REST client, or as a URI in a browser:

curl http://localhost:9200/beer-sample/_search?q=blueberry

Elasticsearch will return a result set as JSON as follows:

{"took":2,
"timed_out":false,
....
        "hits" : 8,
    ....
        {
        ....
        "_index":"beer-sample",
        "_type":"couchbaseDocument",
        "_id":"dark_horse_brewing_co-tres_blueberry_stout",
        "_score":1.8963704,
        "_source": ....
        "
        .....
        "_index":"beer-sample",
        "_type":"couchbaseDocument",
        "_id":"yegua_creek_brewing_dallas-blueberry_blonde",
        "_score":1.2890494,
        "_source": ....
        ....
        }
}

For the sake of brevity we show just the first two results out of a result set containing eight hits. Each item has a "_score" field which Elasticsearch uses to indicate the level of relevance for search hits. Notice that source attribute will contain only metadata saved by Elasticsearch rather than the entire document contents. We do this because Couchbase Server provides incredibly fast access to the documents. So we use _id sent back by Elasticsearch to retrieve the document out of Couchbase Server. To start we view the document using Couchbase Web Console:

  1. Copy one of the document IDs returned by Elasticsearch, for instance dark_horse_brewing_co-tres_blueberry_stout.
  2. Click on the Data Bucket tab in Couchbase Web Console. A table appears with a list of all Couchbase Buckets.
  3. Click on the Documents button for the beer-sample bucket. A table appears which displays all documents in the bucket.
  4. In the Document ID field, paste the document ID dark_horse_brewing_co-tres_blueberry_stout. The JSON document for that beer will appear. You can click on the document name to view the entire JSON document.

Elasticsearch supports more complex queries using their REST API; for instance you can search the beer database for a style ‘lambic’ and for ‘blueberry’ in the description. In this case you send a HTTP POST request. The JSON request will appear as follows:

{
"query": {
    "query_string": {
            "query_string": {
                "query": "style: lambic AND description: blueberry"
            }
        }
    }
}

Here we scope the search so that it looks for ‘lambic’ in the style field and ‘blueberry’ in the description and we get this result:

{
    "name" : "Wild Blueberry Lager",
    "abv" : 8,
    "brewery_id" : "110f01",
    "description" : "....blueberry aroma....",
    "style" : "Belgian Fruity Lambic"
    ....
}

Rather than using the web console to retrieve a document, you would typically use a Couchbase SDK to retrieve the documents the IDs. There are specific methods and functions available in each SDK to retrieve one or more items based on the IDs. For more information about reading and writing data from an application with Couchbase SDKs, see Couchbase Developer Guides.

For more information about the JSON request and response documents for Elasticsearch, see Elastic Search, Search API.