Indexing

Indexing

Couchbase Server speeds up data access with indexes. Indexing is especially important because items in production Couchbase clusters are distributed across many nodes, and any item may need to be accessed in order to satisfy a query.

Couchbase supports a variety of global secondary indexes and view-based indexes that may be used to improve performance at the cost of some processing overhead to maintain the index. Since index data is smaller than the documents themselves, it is often possible to hold an entire index on one node rather than distributing the index across multiple nodes. Doing this can make querying a node a local access operation, reducing network access times. Couchbase provides the EXPLAIN command that enables detailed exploration of query execution and index use that can help guide decisions about what to index. In order to further reduce index processing overhead and satisfy a request as quickly as possible, Couchbase employs many clever strategies such as deferred index building (create multiple indexes with a single scan) and flexible index consistency settings for controlling staleness.

Despite their name, Global Secondary Indexes (GSI) can be used to create high performance primary or secondary indexes. Couchbase global secondary indexes are partitioned separately from data. Even though Couchbase data is distributed among nodes, for best performance developers and administrators typically create indexes on a selected set of nodes running the index service. Keeping the entire index on one node means that operational complexity remains constant as the cluster grows and in case a covering index is used, it also ensures that requests to the index can be satisfied using local data without incurring network latency.

  • Composite Indexes optimize access for queries that require data from multiple attributes.
  • Covering Indexes include all of the information needed to satisfy a query without accessing the data, so they are very fast.
  • Filtered Indexes (or partial indexes) perform filtering that allows users to create an index on just a subset of the data by using the WHERE clause. This has the effect of reducing the index size and associated maintenance impacts in order to remain scalable and performant when handling large datasets.
  • Function-based Indexes result from computing the value of an expression over a range of documents.
  • Sub-document Indexes are used to index embedded structures and complex objects. Sub-document indexes can significantly improve query performance on serialized JSON objects.
  • Incremental MapReduce Views are typically used as indexes. When building a MapReduce view, a developer can custom tailor it to optimize support for a specific set of queries, usually complex queries that perform sorting and aggregation to support real-time analytics over very large datasets. For example, a MapReduce view index would be useful in building an interactive report on sales data that can be explored by date, item, or region. A single MapReduce view index can be created by emitting separate rows that simultaneously aggregate the sales data according to each of the three separate dimensions.
  • Spatial Views speed up access to multidimensional numeric data and are created using Spatial MapReduce. The spatial index is an R-Tree, a structure capable of representing multiple, potentially overlapping regions. Geometry information is supported using the GeoJSON specification. Spatial views can also be created on numeric data to define a multidimensional cube (hyper-cube), such as a relation of customer ages, incomes, and lifetime spend. The advantage of a spatial view compared to a compound index is that the spatial view can serve queries with the dimensions in any order. See this talk for more information.
  • Full Text Indexes allow developers to easily add full-text search capabilities to their application using CBFT (developer preview), search use cases without deploying additional components, which reduces operational complexity.

For more information on how the indexers and index services work, see Views, Indexing, and Index Service.