Elasticsearch Plug-in 2.1

Elasticsearch Plug-in 2.1

The Couchbase Elasticsearch plug-in allows you to use Couchbase Server with Elasticsearch, an open source search engine. The plug-in enables you to provide rapid, full-text search results in your application.

Your users can retrieve application documents from Couchbase Server based on text in your documents. For example, if your application is a product catalog, users can find items based on text descriptions of the products.

The data model for Elasticsearch is compatible with the schema-free, document-oriented model used by Couchbase Server. Because search is often a CPU-intensive process, you can scale your Elasticsearch cluster separately from your Couchbase cluster to best meet the demands of your users. In doing so, search functions do not slow the performance of Couchbase Server reads or writes.

How the Plug-in Works

The plug-in continuously streams data between Couchbase Server and Elasticsearch. Any document changes made in Couchbase Server are sent to Elasticsearch via the plug-in, which ensures your search index contains the most current items in your system. The plug-in enables the following:

  • Real-Time replication. The plug-in continuously transfers data to the search cluster after Couchbase Server writes the data to disk. That helps keep your search index on Elasticsearch current with the information in Couchbase Server.
  • Topology awareness. Using the plug-in, the system can handle node failures within a Couchbase cluster or Elasticsearch cluster and adapt accordingly. Replication from a Couchbase cluster continues from functioning nodes and the items are sent to available servers in the Elasticsearch cluster.
  • Recovery from network failure. The plug-in is aware of what data has already been replicated from Couchbase and what data still needs to be replicated. If a network failure interrupts data transfer from a Couchbase cluster, after the network issue has been resolved, replication resumes for remaining data.

If you are already building applications with Couchbase Server, you are probably aware of using views to index and query data from the server. You use this functionality to find, retrieve, and sort data based on document metadata and specified document attributes. For example, if you have a database that contains information about beers, you could use views to retrieve all beer documents where the alcohol percentage is greater than 8%. Or you could use views to calculate the average alcohol percentage of all beers in an application. Providing full-text search for Couchbase documents with Elasticsearch complements this functionality by enabling you to provide text-based search results. The following figure shows the different elements in a system using Couchbase Server and Elasticsearch:

Your website or web application interacts with Couchbase Server via a Couchbase SDK. SDKs are available for a variety of popular web programming languages. Each SDK provides APIs for establishing a connection with the server and for communicating reads, writes, and other functions with the server. The SDK also provides methods to index and query data from Couchbase Server by using views.

To provide full-text search with Couchbase Server you need to have a cluster of Elasticsearch engines, the Couchbase Elasticsearch plug-in, and a running Couchbase cluster. After an application writes or updates data in Couchbase Server, the server replicates a copy of that data to the Elasticsearch cluster for indexing. The Elasticsearch cluster performs indexing based on text content in your data. After the data is indexed, you can use an Elasticsearch client to send a search query to Elasticsearch via HTTP. Elasticsearch does not keep an entire copy of each item replicated from Couchbase cluster. After Elasticsearch indexes an item it keeps the ID for the item and other metadata, but discards the content to remain efficient. After your application queries Elasticsearch for an item via HTTP, it sends back document IDs that you can use to retrieve the entire document from Couchbase Server.

How Querying Works

Using Couchbase Server with Elasticsearch, users can actually perform searches based on rich text content within a document, such as information in text descriptions. Imagine a user wants to find a beer that has a nice fruity flavor reminiscent of blueberries. Your application can take in a query string as a web form parameter and send the query to Elasticsearch as JSON containing the term blueberry. Your application sends a query via HTTP to Elasticsearch as follows:

GET http://127.0.0.1:9200/beer-sample/_search?q=blueberry

Elasticsearch responds by sending the document ID for this beer document in a JSON array. The array will contain elements with document IDs for those documents which match the query parameter:

....
"hits" : [{
    ....
    "_source": {
        "meta" : {
            "id": "110ac410b16h"
            .....
        }
    }
}....

For purposes of brevity we omit other details from the JSON response. The id field in the response is the ID you use to retrieve a document from Couchbase Server. With this ID, you can retrieve a document from Couchbase for the beer document that contains the term blueberry in the text description:

{
    "name" : "Wild Blueberry Lager",
    "abv" : 8,
    "brewery_id" : "110f01",
    "description" : "....blueberry aroma....",
    "style" : "Belgian Fruity Lambic"
    ....
}

In addition to text search queries, you can also provide logic and regular expressions to describe search criterion.