Indexing

Indexing

An index is a data-structure that provides quick and efficient means to query and access data, that would otherwise require scanning a lot more documents. Couchbase Server provides three types of indexers which you can use to build indexes. You must keep in mind the guidelines or restrictions described in the following sections when selecting the right indexes in your application.

Global Indexes versus Local Indexes

Architecturally distributed databases can benefit from both local and global indexes. Couchbase Server provides both.

Local indexes provide the ability to index data locally on the node at a high speed and optimize network efficiency while building indexes. This is great if the index has complex processing logic or pre-aggregating data in various different shapes for analytics. However, as the cluster widens to many nodes, over time the network processing can increase the query response time. For real time analytic systems, reporting environments, or dashboards, this may work fine. But for interactive applications running simple lookup queries or range scans, latency can be a big issue. MapReduce Views and Spatial Views provide the local indexing capability in Couchbase Server.

Global indexes provide the ability to execute interactive queries quickly with less over-the-network processing (or without scatter-gather). Global indexes can be independently partitioned and are not spread equally across nodes like local indexes. Global index maintenance requires network IO, however at scan time, global indexes simply eliminate scatter-gather by aligning the partitioning logic to query and by centralizing the indexes to fewer partitions. Global Secondary Indexes in Couchbase Server provide the global indexing capability in Couchbase Server.

The following figure demonstrates the difference between global and local indexes for query execution. In short, global indexes are great for low latency queries, while local indexes are great for more complex index logic that can pre-aggregate data.
Figure 1. Local versus Global Indexes

Indexes and Querying Couchbase Server Data

Applications can query the Couchbase Server in one of the following ways:

  • Using key-value access. Applications can directly access data through the data service with document keys for fastest access.
  • Using N1QL. Applications can choose to use SQL-like syntax with global secondary indexes or MapReduce views to speed up queries.
  • Using Couchbase View API directly.Applications can choose to directly query the MapReduce view indexes or spatial views for purpose built pre-computed indexes.
For querying with N1QL or the Full-text or View API there are four types of indexers in Couchbase Server and these indexes are handled by different services.
Global Secondary Index
Global Secondary Index (GSI) enables applications to perform fast queries at a high throughput using N1QL. Unlike the view index that indexes only a subset of the data local to each data node, global secondary indexes are stored on an independent set of nodes across the cluster. Global secondary indexes can index the entire bucket's data or a subset in a single location and reside on the index nodes. The index service is responsible to handle GSIs.
MapReduce views
MapReduce views (also called views) are indexes built using JavaScript map and reduce functions defined on documents in a Couchbase bucket. Views can be accessed directly from the View API or from N1QL.
Note: N1QL can only utilize views created through CREATE INDEX ... USING VIEW statements.
Views are built incrementally, re-indexing only the documents that have changed since the last index update. Couchbase Server precomputes and stores view results for fast retrieval. The data service is responsible to handle MapReduce views.
Spatial views
Couchbase Server uses spatial views to query geospatial information. A spatial view can contain geometric data that can be queried to return information based on whether the recorded geometries exist within a given multidimensional range. They can also be used for non-geometric multidimensional range queries.
The major difference between MapReduce views and spatial views is that instead of two functions, one each for map and reduce, spatial views contain a single function called spatial function. The spatial function is similar to the map function in MapReduce views. Spatial views also output more than a single index key per document, enabling queries against several fields in a single query. The data service is responsible to handle spatial views.
Note: Passing large documents to the view engine can adversely affect the performance. Hence, Couchbase Server limits the size of the documents during indexing by default. For more information, see Cluster Operations.
Full Text Index
Full text Index (FTS) enables applications to perform fast keyword searches (Google-like searches) over their data using the FTS API. Full text index is a global index just like global secondary indexes. Unlike the view index that indexes only a subset of the data local to each data node, full text indexes are independently distributed to a set of nodes across the cluster. Full text indexes can index one or more buckets data. The Full text service is responsible to handle FTS indexes.
Note: Full text Index and Full text service is provided as a Developer Preview in this version. We do not recommend using the capability in production systems.