Global Secondary Indexes (GSIs)

Global Secondary Indexes (GSIs)

Global secondary index (GSI) is a powerful solution for secondary lookup queries that are required for interactive applications that require low latencies. Global secondary indexes (GSIs) are defined using the CREATE INDEX statement in N1QL. The CREATE INDEX statement supports the use of document attributes, N1QL expressions including functions, and a filter (a WHERE clause to limit the documents being indexed) in the index key.

The indexer for GSI processes the mutations to the bucket and creates a B+ tree for fast scans on the index key. Queries that filter on a range can use GSI to quickly acquire the document keys that satisfy the filter expression. They can then query the bucket to get the full document values and project the final query result.

Indexing with Global Secondary Indexes

Global secondary indexes (GSIs) are defined using the CREATE INDEX statement in N1QL. The CREATE INDEX statement supports the use of document attributes and N1QL expressions including functions in the index key. Using CREATE INDEX, you can also define a filter (that is, a "WHERE" clause with WHERE type="person_document") to limit the index to person documents in the bucket) which in turn limits the documents being indexed. There can only be zero or one index key output per document key.

For example, you can create a global secondary index, also referred to as index, on name that indexes beer name attributes for documents of type "beer" using the following CREATE INDEX statement in N1QL.
    CREATE INDEX Index1_beer_name 
    ON `beer-sample`(name) 
    WHERE type="beer" USING GSI;
GSI processes mutations in the bucket by coordinating with the following processes:
Projector and Router
This process is responsible for processing incoming mutations on the bucket. It resides on the nodes that run data service and thus contains active vBuckets. This process understands the indexes that exist on the bucket at any given time and processes incoming mutation to only send the relevant information to the Supervisor process for maintaining indexes.

Every node has a single projector and router process which runs the data service. It can listen on all buckets on the node, if there are indexes on these buckets. Projector and router process sends a stream of changes per bucket to each node running the index service. Indexes may index only a small set of the attributes in a document and may filter documents to index a subset of the documents in a bucket. The stream of changes is optimized to only contain relevant information for the indexes created on the nodes running the index service.

Supervisor
This process is responsible for both processing mutations sent by the projector and router process for the indexes it maintains, and for responding to scan requests from N1QL queries that are executed as part of the query service. It resides on the nodes that run the index service.

Each node running the index service can contain different indexes (for example, index1 and index2 can be on node A while index3 and index4 can be on node B). Each index only indexes mutations from one bucket but each node running the index service may contain indexes from multiple buckets (for example, index1 and index3 can be indexes on bucket X and index2 and index4 can be indexes on bucket Y). Indexes may index a small set of the attributes in a document and may filter to documents to only index a subset of the documents in a bucket.

Every node has a single supervisor process which runs the index service. The supervisor process listens to change streams from the projector and router processes from all nodes running the data service. It evaluates the incoming stream of changes for the specific indexes created on the node running the index service. The stream of changes is optimized to only contain relevant information for the indexes created on the nodes running the index service. This ensures optimized data exchange and faster indexing between the projector and router process and supervisor process. This is especially important when using multidimensional scaling to deploy data service and index service to independent set of nodes within the cluster creating dedicated zones inside the cluster.

When a mutation arrives at the bucket, the mutation is processed in memory by the vBucket manager that is responsible for the document key in the data service. The mutation is then queued for replication with DCP to propagate the mutation to replica vBuckets, to the views on the bucket, and to the projector and router process for GSI processing. The projector and router process picks up the mutation from DCP and removes the attributes that are not part of index filter or definition on any of the GSI indexes being maintained on nodes running the index service. The supervisor process picks up the stream of changes and applies them to the indexes maintained locally on the node.
Figure 1. Execution flow of Global Secondary Indexes (GSIs)