Global Secondary Index Storage

Global Secondary Index Storage

Nitro

Memory-Optimized Indexes use a totally in-memory storage engine designed for low-latency, high-throughput storage of indexes called Nitro. Its write throughput scales almost-linearly with CPU cores due to its innovative use of lockless data structures. Fast index scans of these datastructures are facilitated by skip lists, drastically improving query performance when using indexes.

Overlying the skiplist used to store the data is a powerful Multi Version Concurrency Control component designed to offer the following:

  • Immutable Snapshots: Concurrent writers add or remove items into the skiplist simultaneously. A snapshot of the current items can be created to provide a point-in-time view of the skiplist. This is useful for providing repeatable stable scans. Users can create and manage multiple snapshots. If an application requires atomicity for a batch of skiplist operations, it can apply a batch of operations and create a new snapshot. Changes are invisible to scans until a new snapshot is created.

  • Fast and Low Overhead Snapshots: Readers of the skiplist use a snapshot handle to perform all lookup and range queries. An indexer application typically requires many snapshots to be created every second to serve incoming queries. The overhead of creating and maintaining a snapshot is therefore minimal to service the high rate of snapshot generation.

  • Avoid Phantom Reads: A query operation using a specific snapshot must always return the same results. Items being removed or added to an index will not be visible to the query, a query operation using a snapshot is repeatable.

  • Memory Efficiency: Disk-oriented data structures such as B+Trees process updates in the unit of disk page sizes. This approach introduces significant storage overhead per item. For a single entry update, the entire residing page needs to be copied. Nitro allocates the only exact amount of space required to hold an item. This is important for predictable capacity planning and minimization of memory usage.

  • Fast and Scalable Garbage Collection: For a system that generates hundred immutable snapshots per second, there could be hundreds of snapshots becoming unused every second. The garbage collector is therefore efficient enough to keep up with high rates of garbage generation. The Nitro garbage collector is concurrent and scales linearly with the number of concurrent writers used by the system.

Figure 1. Nitro Architecture

While Nitro is a memory-optimized storage engine, the index is also snapshotted to disk at regular intervals to allow the indexer to recover from a restart without needing to rebuild the entire index.

Plasma

The Plasma index storage engine is an extension of the Nitro storage engine, used by Standard Global Secondary Indexes. This extension allows the index to exist partially in memory and partially on disk, removing the constraint of having to have enough memory to store all indexes in their entirety.

Plasma uses lockless data structures for its on-disk format to ensure that concurrent disk access does not become a significant bottleneck. Specifically, Plasma stores the evicted pages using a log-structured format which is a form of append-only storage. Unlike B+-Trees traditionally used to store database indexes, log structured files do not rely on threads holding intermediate locks when inserting items. This means that writer threads do not interfere with each other and slow down the insertion of items.

The on-disk format also reduces the write amplification experienced when writing data to disk. It does this by incrementally persisting new data as it is evicted from the in-memory skiplist (based on an LRU algorithm) rather than always persisting entire snapshots. To retain data consistency, previous unused snapshots are merged and written to disk as full snapshots at a regular interval (every 10 minutes).