Global Secondary Indexes for N1QL

Global Secondary Indexes for N1QL

Global Secondary Index (GSI) supports a variety of OLTP like use-cases for N1QL including basic, ad-hoc, and short-running reporting queries that require filtering. For example, if you have a WHERE clause in a N1QL statement that selects a subset of your data on which the secondary index is defined, you can see a significant speedup in the N1QL query performance by using GSI.

Global secondary indexes are deployed independently into a separate index service within the Couchbase Server cluster, away from the data service that processes key based operations. GSIs provide the following advantages:
  • Predictable Performance: Core key based operations maintain predictable low latency even in the presence of a large number of indexes. Index maintenance does not compete with key based operations even under heavy mutations to data.
  • Low Latency Queries: GSIs independently partition into the index service nodes and don’t have to follow hash partitioning of data into vBuckets. Queries using GSIs can achieve low latency response times even when the cluster scales out because GSIs don’t require a wide fan-out to all data service nodes.
  • Advanced Scaling Model: GSI can be placed onto independent set of nodes. Administrators can add new indexes and evolve the application performance without stealing cycles from the incoming workload.

Creating Global Secondary Indexes

You can define a primary or secondary index using GSIs in N1QL using the CREATE INDEX statement and the USING GSI clause. For more information on the syntax and examples, see CREATE INDEX statement.

Placement of Global Secondary Indexes

GSIs reside on the index nodes in the cluster. Each index service node can host multiple indexes and a single index can be hosted on multiple index nodes. Every index has an index key(s) (used for lookup). When the index type is not primary index, index can have an index filter with the WHERE clause.

CREATE INDEX index_name ON ( index_key1, ..., index_keyN) 
    WHERE index_filter 
    WITH { "nodes": [ "node1:8091" ] } 
    USING GSI | VIEW;

Based on the index filter, the index can be partitioned across multiple index service nodes and placed on given nodes using the WITH clause and "nodes" argument in CREATE INDEX statement.

Administrators can place each index partition in a separate node to distribute index maintenance and index scan load. Index metadata stored on the index node knows about the distribution of the index. GSI does not use scatter-gather. Instead, based on the index metadata, it only touches the nodes that have the relevant data.

Index Performance

GSIs provide a lower latency scan compared to view indexes due to its architecture. GSI, as its name suggests is a global index that is independently placed within the cluster as opposed to Views, which are local indexes placed aligned to the data distribution.
Figure 1. Query Execution with Global Indexes

GSI comes with two storage settings:
  • Memory optimized global secondary indexes
  • Standard global secondary indexes. Standard GSIs come with two write modes:
    • Append-only write mode
    • Circular write mode

The default storage setting for GSI is standard GSI. Memory optimized GSI setting can be selected at the time of the initial cluster setup. Write mode can be selected when the storage setting for GSI is standard. You can change the write mode at any time while the cluster is running, however the setting change require a restart of the index service which can cause a short period of unavailability.

Global secondary indexes can be placed on a single node. However as the number of index scans, mutations, the total items indexed, and the total index size increases, it will be necessary to scale the index. You can scale the index in 2 ways:
  • Load balancing with GSI. You can create replicas of the same index to allow scans to be routed to multiple nodes.
  • Partitioning with GSI. You can split the index into multiple partitions, to distribute indexed items, index scan requests, and limit the mutations and size the single index has to handle.
In either approach, you also need to scale the index service to allocate more computational resources to indexes.
Load Balancing with GSI
It is possible to create indexes with replicas distributed amongst multiple nodes. Unlike replicas used in the data service, each replica index is considered to be equivalent to the active index. This means that the query service automatically load balances between the replica indexes, even when using the USE INDEX statement or prepared statements.
See Index Replication and High Availability with N1QL for further details on index replication.
Partitioning with GSI
Some of your indexes may no longer fit a single node as the number of mutations, index size, and item count increases. You can partition a single index using a WHERE clause with CREATE INDEX and by placing partitions across multiple index service nodes under different index names.

To load-balance GSIs, you must manually specify the nodes on which index partitions should be built on.

You can also create indexes by partitioning your indexes as shown in the following example using ranges:
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) 
       WHERE type="product" AND productName BETWEEN "A" AND "K" USING GSI 
       WITH {"nodes":["node1:8091"]};
       
CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) 
       WHERE type="product" AND productName BETWEEN "K" AND "Z" USING GSI 
       WITH {"nodes":["node2:8091"]};
Scaling the Index Service
Couchbase Server scales indexes independent of data and queries. With multidimensional scaling, you can allocate separate hardware resources for separate services, and avoid resource contention by performing queries, maintaining indexes, and writing data to different nodes. If your application needs more indexing resources, you can either scale out your infrastructure to add more index nodes, or scale up the index services to handle more workload.

Query and Index Consistency

In Couchbase Server, mutations to data in the data service is done with full consistency. All mutations to a given key are done using the same vBucket on a node and are immediately available to anyone reading the latest value for the given key. However, indexes are maintained in an eventually consistent manner. This is true for all indexers (GSI, View, and Spatial). At query time, you can specify a query consistency flag for each N1QL query request, similar to the view API. The query consistency flag can be one of the following:
  • Not_bounded: This scan consistency flag executes the query immediately without requiring any consistency for the query. If the index maintenance is running behind, query may return out-of-date results. Not_bounded scan consistency has the same characteristics as stale=ok in the view API.
  • At_plus: This scan consistency flag executes the query but require the indexes to be updated to the logical timestamp of the last update performed by the application. For example, an application issuing the query may have done its last update 10 ms ago. The logical timestamp of the update is retrieved with the mutation ACK response and is passed to the query request. This behavior achieves consistency, at least or later than the moment of the logical timestamp. If the index maintenance is running behind the logical timestamp, the query waits for the index to catch up to the last updates logical timestamp. At_plus scan consistency flag automatically degrades to the same characteristics as stale=false in the view API.
  • Request_plus: This scan_consistency flag executes the query but require the indexes to be updated to the logical timestamp of the query request. For example, an application issuing the query may have done its last update 10 ms ago. An application issuing the query with request_plus scan consistency flag takes the logical timestamp of the query request. This behavior achieves consistency, at least or later than the moment of the request timestamp. If the index maintenance is running behind the request timestamp, the query waits for the index to catch up to the request timestamp. At_plus scan consistency flag can yield faster response times if the application can relax its consistency requirements to read-your-own-write, as opposed to a stricter request time consistency. Request_plus scan consistency value has the same characteristics as stale=false in the view API.
For N1QL, the default consistency setting is not_bounded.

Index Replication and High Availability with N1QL

Individual GSIs can be automatically replicated to other nodes in the cluster to achieve high availability, ensuring that an index continues to function even if a node hosting the index is unavailable. Queries will load balance across the indexes and if one of the indexes become unavailable, all requests are automatically rerouted to the available remaining index without application or admin intervention.

Replica indexes can be created in three different ways, each using the CREATE INDEX statement. The first of these is to specify the nodes which the index should be located in the WITH clause. The following example creates an index with two replicas, one on node2 and one on node3, with the active on node1.

CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) 
       WHERE type="product" USING GSI 
       WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]};

It is also possible to create an index with replicas without specifying which nodes the index should reside on. Adding num_replica to the WITH clause creates an index with the specified number of replicas, distributed across index nodes in the cluster. When creating an index with replicas in this manner, the indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible. This means that should an entire server group go down, a replica of the index should still be available on one (or more) of the nodes in another (or multiple) server group(s).

The following example creates an index with two replicas without explicitly specifying which nodes they should reside on:
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) 
       WHERE type="product" USING GSI 
       WITH {"num_replica": 2};

If both nodes and num_replica are specified in the WITH clause, then the number of nodes specified must be one greater than num_replica.

Finally, it is possible to create GSIs with replicas without explicitly stating the number of replicas. This is done by altering the number of replicas that the indexer will create, which defaults to 0. Changing the default number of replicas can be done using the following curl command and must be done by a full administrator:
curl -u <username>:<password> <host>:9102/settings -d "{\"indexer.settings.num_replica\": <num_replicas>}"
If manually specifying the number of replicas to create for a given index in the CREATE INDEX statement, then this takes precedence over the default number of replicas from the indexer.
Important: When using any of the three above methods, you must ensure that the number of replicas being created is less than the number of index nodes in your cluster, otherwise the index creation will fail. Once an index has been created with a given number of replicas, if the number of index nodes in a cluster goes below the number of nodes required then new replicas will be created on any incoming index nodes, until the desired number of replicas exist for a given index.
Once an index with replicas has been created, you will be able to see each of its replicas (and active) as individual entries in the 'Index' tab in the web console. However, as the replicas are opaque to the query service, you will not see these replicas in system:indexes.

Standard Global Secondary Indexes

Standard global secondary indexes is the default storage setting for Couchbase Server clusters. Standard global secondary indexes (also called global secondary indexes, indexes, or GSI) can index larger data sets as long as there is disk space available for storing the index.

Standard Global Secondary Indexes uses ForestDB for indexes that can utilize both memory and persistent storage for index maintenance and index scans. ForestDB is Couchbase’s state-of-the-art storage engine with a modified data structure to increase read and write performance from storage.

Enabling Standard Global Secondary Indexes

By default, Couchbase Server uses standard global secondary indexes storage setting with the circular write mode for all indexes.

Standard and memory optimized storage settings apply to all indexes in the cluster and cannot be mixed within a single cluster.

At the time of the cluster’s initial setup, storage setting can be switched between standard and memory optimized GSI storage settings. Changing the storage setting for GSI requires removing all index service nodes.

Standard Global Secondary Index Performance

Different from the memory optimized storage setting for GSI, the performance of standard GSIs depend heavily on the IO subsystem performance. Standard GSIs come with 2 write modes:
  • Append-only Write Mode: Append only write mode is similar to the writes to storage in the data service. In append-only write mode, all changes are written to the end of the index file (or appended to the index file). Append only writes invalidate existing pages within the index file and require frequent full compaction.
  • Circular Write Mode: Circular write mode optimizes the IO throughput (IOPS and MB/sec) required to maintain the index on storage by reusing stale blocks in the file. Stale blocks are areas of the file that contain data which is no longer relevant to the index, as a more recent version of the same data has been written in another block. Compaction needs to run less frequently under circular write mode as the storage engine avoids appending new data to the end of the file.

    In circular write mode, data is appended to the end of the file until the relative index fragmentation (stale data size / total file size) exceeds 65%. Block reuse is then triggered which means that new data is written into stale blocks where possible, rather than appended to the end of the file.

    In addition to reusing stale blocks, full compaction is run once a day on each of the days specified as part of the circular mode time interval setting. This full compaction does not make use of the fragmentation percent setting unlike append-only write mode. Between full compaction runs, the index fragmentation displayed in the UI will not decrease and will likely display 65% most of the time, this particular metric is not relevant for indexes using circular write mode.

By default, Couchbase Server uses the circular write mode for standards GSIs. Append only write mode is provided for backward compatibility with previous versions.
When placing indexes, it is important to note the disk IO "bandwidth" remaining on the node as well as CPU, RAM and other resources. You can monitor the resource usage for the index nodes through Web Console and pick the nodes that can house your next index.
Figure 2. Monitor Resource Usage for Indexes

There are also per-index statistics that can help identify the item counts,disk and data size, and more individual statistics for an index.
Figure 3. Per-index Statistics

Aside from the performance characteristics, the mechanics of creating, placing, load balancing, partitioning and HA behavior is identical in both standard and memory optimized global secondary indexes.

Memory-Optimized Global Indexes

Memory optimized global secondary indexes is an additional storage setting for Couchbase Server clusters. Memory optimized global secondary indexes (also called memory optimized indexes or MOI) can perform index maintenance and index scan faster at in-memory speeds.

Enabling Memory-Optimized Global Indexes

By default, Couchbase Server uses standard global secondary indexes storage setting with the circular write mode for all indexes. In this release, standard vs memory optimized storage settings apply to all indexes in the cluster and cannot be mixed within a single cluster. At the time of the cluster’s initial setup, storage setting can be switched to memory optimized GSI setting.
Note: Changing the storage setting from standard GSI to memory optimized GSI or visa versa cannot be done online in a single cluster. Changing the storage setting for GSI requires removing all index service nodes in the cluster.
Note: Regardless of the GSI storage setting, standard GSI vs memory optimized GSI, it is still possible to create indexes using GSI or view during CREATE INDEX statement with the USING clause.

Memory Optimized Global Secondary Index Performance

There are several performance advantages to using memory optimized global secondary indexes: MOIs use skiplist, a memory efficient index structure for a lock-free index maintenance and index scan. Lock-free architecture increase concurrency of index maintenance and index scans. This enhances the index maintenance rate by more than an order of magnitude under high mutation rates. Skiplist based indexes take up less space in memory. Memory optimized indexes can also provide a much more predictable latency with queries as they never reach into disk for index scans.

MOIs use ForestDB for storing a snapshot of the index on disk. However writes to storage are done purely for crash recovery and is not in the critical path of latency of index maintenance or index scans. The snapshots on disk is used to avoid rebuilding the index when a node experiences failure. Building the index from the snapshot on disk, minimizes the impact of index node failures on the data service nodes.

In short, MOIs completely run at in-memory speeds and performs an order of magnitude faster in several cases compared to standard global secondary indexes.

With memory optimized global secondary indexes, it is important to monitor memory usage over time. Memory optimized indexes have to reside in memory. Indexes on a given node will stop processing further mutations, if the node runs out of index RAM quota. The index maintenance is paused until RAM Quota becomes available on the node. There are two important metrics you need to monitor to detect the issues:
  • MAX Index RAM Used %: Reports the max ram quota used in percent (%) through the cluster and on each node both realtime and with a history over minutes, hours, days, weeks and more.
  • Remaining Index RAM: Reports the free index RAM quota for the cluster as a total and on each node both realtime and with a history over minutes, hours, days weeks and more.
If a node is approaching high percent usage of Index RAM Quota, it is time to take action.
  • You can either increase the RAM quota for the index service on the node to give indexes more RAM.
  • You can also place some of the indexes on the node in other nodes with more RAM available.
Optionally, you can also perform the following steps if a node is approaching high percent usage of Index RAM Quota:
  • Drop few indexes from the index node which is in the Paused state.
  • Flush the bucket on which indexes of the Paused node are created.
It is also important to understand other resource utilization beyond RAM. You can monitor the resource usage for the index nodes through Web Console and pick the nodes that can house your next index.
Figure 4. Monitor Resource Usage for Index Nodes

There are also per-index statistics that can help identify the item counts, disk/snapshot and data size and more individual statistics for an index for memory optimized indexes.
Figure 5. Per-index Statistics

Aside from the performance characteristics, the mechanics of creating, placing, load balancing, partitioning and HA behavior is identical in both standard and memory optimized global secondary indexes.

Handling Out-of-Memory Conditions

Memory-optimized global indexes reside in memory. When a node running the index service runs out of configured Index RAM Quota on the node, indexes on the node can no longer process additional changes to the indexes on the node that has run out of memory. The index service logs an error in the Couchbase Server log indicating the condition. You can also observe how much memory is available on each node running the index service using the index statistic Max Index RAM Used % under "Server Resources" stat section of bucket statistics.

Memory-optimized indexes on a node that has run out of memory continue to stay in the ONLINE state. However, the round-robin logic routes traffic away from this index node. If an index is in OFFLINE, DEFERRED, or BUILDING state, the index is not transitioned to the ONLINE state.

When the node is restarted, the indexes stay in the BUILDING state until the last persisted snapshot is read from disk into memory. The additional catchup is done in ONLINE state. Queries with stale=false or RYOW semantics fail if the timestamp specified exceeds the last timestamp processed by the specific index on the node. However, queries with stale=ok continue to execute normally.

To recover from an out-of-memory situation, use one or more of the following fixes:
  • Increase the Index RAM Quota sufficiently enough to give indexes the additional memory to process requests.
  • Drop other indexes on the node to free up memory. On nodes with multiple indexes, drop the unimportant indexes to free memory.
  • Drop buckets with indexes. Dropping a bucket automatically drops all the dependent indexes and has the same effect as dropping all indexes on a bucket.
  • Flush buckets with indexes. Flushing a bucket deletes all data in a bucket. Even if there are pending updates that are not processed, flushing a bucket causes all indexes to drop their data.
    Important: Deleting data selectively from buckets does not help, as mutations are processed in sequence and indexes can not process mutations in an out-of-memory condition.

Changing the Global Secondary Index Storage Mode (Standard vs Memory Optimized)

Storage mode for GSI is a cluster level setting. Currently, the storage mode option sets the storage mode for all indexes on the cluster across all buckets. The storage mode option cannot be changed dynamically either. To change from standard GSI to memory optimized GSI or vice versa, you need to remove all the index service nodes in the cluster. Here is a step by step guide to change the storage mode option:
  1. Identify the nodes that are running the index service. You can do this by simply looking at the "Server Nodes" page on the Web Console. The Services column displays the nodes that have the index service enabled.
  2. Click Remove on each of the nodes that has the index service enabled and push rebalance to remove the nodes from the cluster.
    Note: If you are running a single node, the only way to change GSI storage mode setting is to uninstall and install the server again.
    As you remove all the index service nodes, all the indexes in the system are dropped and the N1QL queries will fail. To maintain availability, you can set up a new cluster with the desired storage mode option for GSI and use cross datacenter replication to replicate the data to the new cluster. If you don't have a space cluster, you can also create all the indexes using the View indexer. See the CREATE INDEX statement and the USING VIEW clause for details). However, the View indexer for N1QL provides different performance characteristics as it is a local index and not a global index like GSI. For better availability when changing the storage mode from MOI to GSI, we recommended that you use the XDCR approach as opposed to views in production systems.
  3. Once all the index service nodes are removed, visit the Settings tab and Cluster settings page and change the Index Storage Mode to the desired new mode. You can also set this option during the addition of the first node that has the index service enabled.
  4. Add new nodes and confirm the new global secondary index storage mode. At this point, all new GSIs will use the new storage mode setting from the cluster.