Querying Data with N1QL

Querying Data with N1QL

The N1QL query requests come in through port 8093 to one of the nodes running the query service. The communication happens over the REST API exposed by the N1QL query engine. Subsequent requests are load balanced across all nodes running the query service to execute N1QL queries.

As N1QL query requests arrive on a node, the N1QL query engine parses the N1QL statement to be executed and compiles an execution plan. The compiled execution plan is then executed by the query engine and depending on the query execution plan, the query engine collaborates with the index service and data service.

N1QL can be used to execute ad hoc queries on all buckets as long as there is a primary index available on the bucket. N1QL query statements utilize the full power of SQL language. For example, you can do the following and much more:
  • Specify a list of attributes to be returned using the SELECT clause
  • Target multiple buckets using the FROM clause
  • Filter data based on powerful predicates and expressions using the WHERE clause
  • Aggregate results using GROUP BY
  • Order results using ORDER BY

The REST API provides additional parameters to control and optimize the execution of the N1QL query. Prepared statement execution enables repeating queries to skip the parsing and execution planning phases of query execution. Additional consistency levels enable queries to set staleness levels.

The following diagram shows the N1QL query engine components - listener, query processor, and data stores.
Figure 1. Components of N1QL query engine

Listener
N1QL query engine listener is responsible for receiving the concurrent query requests.
Query processor
N1QL query processor starts query execution by parsing the query. The parser ensures that the submitted N1QL statement is a valid statement. The Optimizer evaluates the execution paths available and decides on the lowest latency path for the execution. It generates an execution plan for the query that optimizes the query execution path. The execution plan is assembled into a series of operators. The Execution engine receives the execution plan from the optimizer and executes the operators. Operators stream results processing various parts of the query eventually coming up with the final result.
Data stores
Data stores abstract the access to variety of data sources. Couchbase Server data store is used to access Couchbase Server data. The data stores module for Couchbase Server is responsible for governing authentication and accessing both the data and index service. The module also is required to collect the metadata to allow optimizing and execution of the query from the store. There are other data stores such as the one that can access the local filesystem.
Once the parsing and planning is done, the operators are assembled into a plan and executed based on the query as shown in the following diagram. The parallelization of the operators is illustrated with the multiple stacked boxes under scan, fetch, join, filter, and other operators.
Figure 2. Execution flow for N1QL queries

Prepared query execution

The N1QL parser parses and optimizes N1QL queries. The process of parsing and compiling execution plans can be resource intensive. If similar queries are executed by the application repeatedly, you can prepare statements and subsequently execute the prepared version of the statement without having to bear the cost of parsing and planning.

Query consistency with N1QL

N1QL can execute queries through Global Secondary Indexes (GSI) or map-reduce views. Both GSI and view indexers process mutations as soon as they are received by the bucket. However, mutations that arrive at the bucket do not block and wait for GSI or view indexing for the mutation to complete. That is, GSI and views are eventually consistent with ongoing mutations.

When querying with N1QL, as a developer, you may require varying consistency levels. Consistency levels for any given N1QL query can be configured using the staleness parameter for the query. The following consistency levels can be specified for the staleness parameter:
  • scan_consistency=not_bounded

    This level returns the query with the lowest latency as it is the most relaxed consistency level. Selecting this option essentially means the query can return data that is currently indexed and accessible by the index or the view. The query output can be arbitrarily out-of-date if there are many pending mutations that have not been indexed by the index or the view. This consistency level is useful for queries that favor low latency and do not need precise and most up-to-date information.

  • scan_consistency=at_plus

    This level provides the read-your-own-write consistency level and thus executes with higher latencies compared to the unbounded consistency level. This consistency level requires application to provide a scan_vector. Scan_vector is typically the logical timestamp acquired with the most recent update to the data. Applications through the SDK maintain the scan_vector and pass that in to the queries running with at_plus consistency level to achieve just enough consistency to be able to read the write. A query submitted with at_plus consistency level requires all mutations, up to the moment of the scan_vector (the logical timestamp passed in with at_plus), to be processed before the query execution can start.

  • scan_consistency=request_plus

    This level provides the strictest consistency level and thus executes with higher latencies than the other levels. This consistency level requires all mutations, up to the moment of the query request, to be processed before the query execution can start. This ensures that any writes that are done prior to issuing the query request, and maybe more recent mutations, are indexed by the GSI or the view indexer, and will be returned by the N1QL query if it qualifies for the resultset. This guarantee is important to applications that require consistent reads or read-your-own-write semantics.

Global Secondary Indexes versus Views with N1QL queries

N1QL can utilize both GSIs and Views. Before we list the differences between views and GSIs, it is important to note that N1QL utilizes a subset of view capabilities. N1QL does not support user defined map-reduce definitions and restricts the definitions to a subset of capabilities only available through the CREATE INDEX statement.

At a high level, GSIs are purpose-built for N1QL and thus provide many benefits. For more information, see Global secondary indexes versus views.