Index Storage Modes

Index Storage Modes

A key feature of global secondary indexes is the ability to change the underlying storage method to best suit your indexing needs. Both a memory-optimized storage engine and a disk-optimized (standard) storage engine are available.

Memory-Optimized Global Indexes

Memory optimized global secondary indexes is an additional storage setting for Couchbase Server clusters. Memory optimized global secondary indexes (also called memory optimized indexes or MOI) can perform index maintenance and index scan faster at in-memory speeds.

Memory Optimized Global Secondary Index Performance

There are several performance advantages to using memory optimized global secondary indexes:
  • MOIs use a memory efficient index structure for a lock-free index maintenance and index scan. Memory optimized indexes also provide a much more predictable latency with queries as they never reach into disk for index scans.
  • MOIs store a snapshot of the index on disk. However writes to storage are done purely for crash recovery and is not in the critical path of latency of index maintenance or index scans. The snapshots on disk are used to avoid rebuilding the whole index when an index node experiences failure.
  • In short, MOIs run at in-memory speeds and perform an order of magnitude faster in several cases compared to standard global secondary indexes.
With memory optimized global secondary indexes, it is important to monitor memory usage over time. Memory optimized indexes have to reside in memory. Indexes on a given node will stop processing further mutations, if the node runs out of index RAM quota. The index maintenance is paused until enough free memory becomes available on the node. There are two important metrics you need to monitor to detect the issues:
  • MAX Index RAM Used %: Reports the max ram quota used in percent (%) through the cluster and on each node both real-time and with a history over minutes, hours, days, weeks and more.
  • Remaining Index RAM: Reports the free index RAM quota for the cluster as a total and on each node both real-time and with a history over minutes, hours, days weeks and more.
If a node is approaching high percent usage of Index RAM Quota, a warning is displayed in the web console, so that remedial action can be taken. Below are a few suggestions for steps which can be taken:
  • You can increase the RAM quota for the index service on the node to give indexes more RAM.
  • You can place some of the indexes on the node onto other index nodes with more RAM available.
  • Drop few indexes from the index node which is in the Paused state.
  • Flush the bucket on which indexes of the Paused node are created.

Handling Out-of-Memory Conditions

Memory-optimized global indexes reside in memory. When a node running the index service runs out of configured Index RAM Quota on the node, indexes on the node can no longer process additional changes to the indexes on the node that has run out of memory. The index service logs an error in the Couchbase Server log indicating the condition. You can also observe how much memory is available on each node running the index service using the index statistic Max Index RAM Used % under "Server Resources" stat section of bucket statistics.

Memory-optimized indexes on a node that has run out of memory continue to stay in the ONLINE state. However, the round-robin logic routes traffic away from this index node. If an index is in OFFLINE, DEFERRED, or BUILDING state, the index is not transitioned to the ONLINE state.

When the node is restarted, the indexes stay in the BUILDING state until the last persisted snapshot is read from disk into memory. The additional catchup is done in ONLINE state. Queries with consistency=request_plus or consistency=at_plus fail if the timestamp specified exceeds the last timestamp processed by the specific index on the node. However, queries with consistency=unbounded continue to execute normally.

To recover from an out-of-memory situation, use one or more of the following fixes:
  • Increase the Index RAM Quota sufficiently enough to give indexes the additional memory to process requests.
  • Drop other indexes on the node to free up memory. On nodes with multiple indexes, drop the unimportant indexes to free memory.
  • Drop buckets with indexes. Dropping a bucket automatically drops all the dependent indexes and has the same effect as dropping all indexes on a bucket.
  • Flush buckets with indexes. Flushing a bucket deletes all data in a bucket. Even if there are pending updates that are not processed, flushing a bucket causes all indexes to drop their data.
    Important: Deleting data selectively from buckets will not decrease the memory usage in an out-of-memory condition. Mutations are processed in sequence and indexes can not process mutations in an out-of-memory condition.

Standard Global Secondary Indexes

Standard global secondary indexes is the default storage setting for Couchbase Server clusters. Standard global secondary indexes can index larger data sets as long as there is disk space available to store the index.

Standard global secondary indexes use a disk optimized format that can utilize both memory and persistent storage for index maintenance and index scans.

Standard Global Secondary Index Performance

Different from the memory-optimized storage setting for GSI, the performance of standard GSIs depend heavily on the IO subsystem performance. Standard GSIs come with 2 write modes:
  • Append-only Write Mode: Append-only write mode is similar to the writes to storage in the data service. In append-only write mode, all changes are written to the end of the index file (or appended to the index file). Append-only writes invalidate existing pages within the index file and require frequent full compaction.
  • Circular Write Mode: Circular write mode optimizes the IO throughput (IOPS and MB/sec) required to maintain the index on storage by reusing stale blocks in the file. Stale blocks are areas of the file that contain data which is no longer relevant to the index, as a more recent version of the same data has been written in another block. Compaction needs to run less frequently under circular write mode as the storage engine avoids appending new data to the end of the file.

    In circular write mode, data is appended to the end of the file until the relative index fragmentation (stale data size / total file size) exceeds 65%. Block reuse is then triggered which means that new data is written into stale blocks where possible, rather than appended to the end of the file.

    In addition to reusing stale blocks, full compaction is run once a day on each of the days specified as part of the circular mode time interval setting. This full compaction does not make use of the fragmentation percent setting, unlike append-only write mode. Between full compaction runs, the index fragmentation statistic will not decrease and will likely display 65% most of the time, this particular metric is not relevant for indexes using circular write mode.

By default, Couchbase Server uses the circular write mode for standards GSIs. Append only write mode is provided for backward compatibility with previous versions.

When placing indexes, it is important to note the disk IO "bandwidth" remaining on the node as well as CPU, RAM and other resources.

Changing the Global Secondary Index Storage Mode

Storage mode for GSI is a cluster-level setting. Currently, the storage mode option sets the storage mode for all indexes on the cluster across all buckets. The storage mode option cannot be changed dynamically either. To change from standard GSI to memory-optimized GSI or vice versa, you need to remove all index service nodes in the cluster. Here is a step-by-step guide to change the storage mode option:
  1. Identify the nodes that are running the index service. You can do this by simply looking at the Server Nodes page on the Web Console. The Services column displays the nodes that have the index service enabled.
  2. Click Remove on each of the nodes that has the index service enabled and rebalance to remove the nodes from the cluster.
    Note: If you are running a single node, the only way to change GSI storage mode setting is to uninstall and install the server again.
    As you remove all the index service nodes, all the indexes in the system are dropped and the N1QL queries will fail. To maintain availability, you can set up a new cluster with the desired storage mode option for GSI and use cross datacenter replication (XDCR) to replicate the data to the new cluster. If you don't have a spare cluster, you can also create all the indexes using the view indexer. See the CREATE INDEX statement and the USING VIEW clause for details. However, the view indexer for N1QL provides different performance characteristics as it is a local index to each data node and not a global index like GSI. For better availability when changing the storage mode from MOI to GSI, use the XDCR approach as opposed to view indexes for N1QL in production systems.
  3. Once all the index service nodes are removed, visit the Settings tab and Cluster Settings page and change the Index Storage Mode to the desired new mode. You can also set this option during the addition of the first node that has the index service enabled.
  4. Add new nodes and confirm the new global secondary index storage mode. At this point, all new GSIs will use the new storage mode setting from the cluster.