Subqueries

Subqueries

In N1QL, a subquery is a SELECT query that is a constituent part of another N1QL query or subquery.

Using subqueries, you can create multiple levels of nesting of queries. The outer levels of queries are called outer or parent queries, and inner level subqueries are called inner or child queries, or simply subqueries.

Syntax



  • A subquery in N1QL is limited to a SELECT query, whereas the outermost parent query can be any DML statement such as SELECT, INSERT, UPDATE, DELETE or MERGE.
  • Subquery must be enclosed in parenthesis.
  • When subquery is used in FROM clause, it must be aliased.
  • Subqueries are N1QL expressions and N1QL evaluates them similar to any other N1QL language expressions. When an expression is contained in a N1QL statement, the expression is evaluated once for every input document to the statement, and same applies to a subquery-expression - the inner SELECT or subquery is executed once for every input document considered in the outer query.
  • A subquery expression returns an array every time it is evaluated, and the array contains the results of the inner SELECT subquery.

Subqueries in DML Statement Clauses

A subquery can occur as part of various clauses of the parent query where generic N1QL expressions can be used. For instance, a SELECT statement can have a subquery in projection-list, WHERE clause, FROM clause, LET clause, and GROUP BY clause. Similarly, an UPDATE, INSERT, DELETE can have a subquery in the WHERE clause. For example, consider the following query which finds all the cities with landmarks that have airports.
Example Q1:

SELECT t1.city
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN (SELECT RAW city
                  FROM `travel-sample` 
                  WHERE type = "airport");

In the above example, the subquery is used as a part of WHERE condition of the parent query and it returns an array of cities that have airports.

Note that the IN or WITHIN predicate needs an array on right-side expression as the subquery evaluates to. Further, the usage of keyword RAW in the subquery projection ensures just the city attribute values in the result without JSON wrapping. The outer query browses through each of the landmark cities and returns the city name if it is in the array of cities returned by the inner subquery.

When subquery is used in the FROM clause, that must be aliased, to be able to access and refer to the results documents of the subquery. In the following example Q2, the subquery is named as t1, and it returns the aggregated value of airport names by city and country. The outer query further processes the result of the subquery to find total number of airports by country where each city has more than 5 airports.
Example Q2:

SELECT t1.country, array_agg(t1.city), sum(t1.city_cnt) as apnum
FROM (SELECT city, city_cnt, array_agg(airportname) as apnames, country
      FROM `travel-sample` WHERE type = "airport" 
      GROUP BY city, country LETTING city_cnt = count(city) ) AS t1
WHERE t1.city_cnt > 5 
GROUP BY t1.country;

For more examples, see the Examples section.

Subqueries in N1QL Expressions

Subqueries can appear as independent SELECT queries or can be combined as a constituent part of other N1QL language expressions.

For example, in the above queries Q1 and Q2, the subquery is a full independent SELECT query. In the following example Q3, the subquery is part of a N1QL expression.

Example Q3:

SELECT array_max((SELECT count(city) as cnt FROM `travel-sample` WHERE type = "airport" GROUP BY city)[*].cnt);

In this example, the subquery (in bold) is wrapped in a parenthesis ( ) and it occurs as part of the expression [*].cnt which treats the result of the subquery as an array of JSON documents (with one attribute cnt) and extracts the cnt values from all the subquery result array elements. Finally, the array_max() function is applied to this to get the maximum number of airports in any of the cities.

Variables in Scope of a Subquery

The scope of subquery is defined as the set of names, identifiers, and variables that are accessible to the subquery. Variables include all keyspace names, aliases, LET or LETTING variables that can be referenced by the subquery. For a multilevel nested subquery, the scope includes scope of the subquery and all the scopes of its parent queries.

A subquery cannot access variables from its sibling subqueries or its parents’ sibling subqueries. For example, in the following nested query structure, the subquery SQ13 can only access the variables from SQ13, SQ1, and Q0, but not from SQ12, or SQ2.

Q0: SELECT ...
      (SELECT ..
	     (SELECT ...) AS SQ12
	     (SELECT ...) AS SQ13   ) AS SQ1
      (SELECT ...
	     (SELECT ...) AS SQ21
	     (SELECT ...) AS SQ22   ) AS SQ2
Note that the keyspace name identifiers themselves are NOT variables, whereas any aliases of the keyspace name identifiers are variables in the scope. For example, in the above query Q1, the keyspace identifier `travel-sample` in the outer query is not considered a variable for the subquery. In fact, that is the reason the subquery can also use the same keyspace name independently and still be a non-correlated subquery. However, the alias t1 defined in the outer query is considered a variable in scope for the subquery.

The following example Q4 is a correlated subquery that uses the variable t1 from the outer query:
Example Q4: Find the landmark that are named with corresponding city name.

SELECT t1.city, t1.name
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN SPLIT((SELECT RAW t2.name 
                        FROM t1 AS t2)[0]);

[
  {
    "city": "Dartmouth",
    "name": "Dartmouth Castle"
  },
  {
    "city": "Edinburgh",
    "name": "The Edinburgh Dungeon"
  }, ...
]

This above example Q4 uses the alias t1 from outer query in the FROM clause of subquery. The subquery aliases t1 to t2 to avoid conflict in same variable name t1 being present in both outer and inner queries scope. The reference to t2 in the subquery is actually referring to the same document from `travel-sample` that is being processed as t1 in the outer query. In other words, the same query can be simply rewritten (without using subqueries) as follows:

Query Q4A:

SELECT t1.city, t1.name
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN SPLIT(t1.name);

Typically, subqueries may refer to any variables and aliases available in the scope to build correlated subqueries and to perform subqueries specific to some context of outer query. See Correlated Subqueries for more details and Examples.

FROM clause in Subqueries

Keyspace Identifier versus Expression

As described in the FROM clause, the from-term can be a keyspace name or identifier or a N1QL expression:

  • Keyspace identifiers are independent sources of data for a query.

  • N1QL expressions can be constructed using various N1QL language constructs including subqueries.

    • Constant expressions are independent sources of data for a query.

    • Variable expressions depend on variables in scope and are evaluated to resolve as input data for the query. These are applicable to subqueries.

    • This applies irrespective of whether it is a simple identifier such as alias, var or a nested path identifier such as keyspace.subdoc1.subdoc2.field, alias.subdoc.field, or var.subdoc.field.

An expression can be a simple identifier or variable such as alias, var or more complex with various N1QL language constructs. Either way, N1QL evaluates from-term to resolve to keyspace identifiers or expressions as follows:

  • A from-term is considered as an expression if it is not a keyspace name identifier.
  • If simple identifier can be considered as identifier or expression depending on various factors. An identifier var is considered as an expression if it is variable in scope defined through LET or LETTING, or explicit keyspace alias in parent queries.

In the following equivalent queries, explicit alias t of `travel-sample` or LET variable x is treated as an expression and hence a nested path like t.geo.lat is allowed in the subquery FROM clause.

Example Q5A:

SELECT count(*) FROM `travel-sample` t
WHERE (SELECT RAW t.geo.alt FROM t t1)[0] > 6000 ;
Example Q5B:

SELECT count(*) FROM `travel-sample` t
WHERE (SELECT RAW alt FROM t.geo.alt)[0] > 6000;
Example Q5C:

SELECT count(*) FROM `travel-sample` t
LET x = t.geo
WHERE (SELECT RAW y.alt FROM x y)[0] > 6000;
Example Q5D:

SELECT count(*) FROM `travel-sample` t
WHERE (SELECT RAW geo.alt FROM t.geo)[0] > 6000;
Implicit alias

When explicit alias is not defined, every identifier will have an implicit alias predefined with the same name as the identifier. Implicit alias of a nested path is defined as the last component in the path. In above example Q5D, the implicit alias of the nested path t.geo in subquery is geo, and in example Q5B the implicit alias of t.geo.alt is alt.

For example, the following example Q6 has no explicit alias for `travel-sample`. So the `travel-sample` used in subquery FROM clause is considered keyspace identifier, but not an expression. That makes the subquery non-correlated by the FROM clause, and the subquery returns all documents from keyspace `travel-sample`.
Example Q6:

SELECT array_length((SELECT RAW t1.geo.alt 
                     FROM `travel-sample` t1))
FROM `travel-sample` LIMIT 4;

[
  {
    "$1": 31596
  },
  ...
]
Contrast Q6 with Q6A below, the subquery is correlated by using the keyspace alias in the FROM clause. The result is only 1 because the subquery is applicable to only the current document t being processed in the parent query.
Example Q6A:

SELECT array_length((SELECT RAW t1.geo.alt FROM t t1))
FROM `travel-sample` t;

[
  {
    "$1": 1
  },
  ...
]
Further, the subquery is required to alias its from-term to avoid conflict with the same identifier in both outer and inner queries. For example, the following example Q6B shows an error:
Example Q6B:

SELECT array_length((SELECT RAW t1.geo.alt 
                     FROM `travel-sample`))
FROM `travel-sample`;

[
  {
    "code": 4020,
    "msg": "Duplicate subquery alias travel-sample",
    "query_from_user": "SELECT array_length((SELECT RAW t1.geo.alt \nFROM `travel-sample` ))\nFROM `travel-sample`;"
  }
]
An implicit keyspace alias is not considered as an expression, as in the above example Q6. However, the nested paths are expressions. In the following example Q6C, the from-term has nested path `travel-sample`.geo as expression where `travel-sample` is referring to the implicit alias of the keyspace in the parent query. Hence this is a correlated subquery, and the result is 1 corresponds to the current document in the parent query.
Example Q6C:

SELECT array_length((SELECT RAW t1.alt 
                     FROM `travel-sample`.geo t1))
FROM `travel-sample`
WHERE type = "airport";

[
  {
    "$1": 1
  },
 ...
]
If the expression does not resolve to any of the variables in scope for the subquery, then that is treated as keyspace identifier and subsequently if the keyspace is not found, an error is raised. For example, in the following query `beer-sample` is not defined in the scope and is treated as a new keyspace identifier.
Example Q7:

SELECT * FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN (SELECT RAW city
                  FROM `beer-sample`
                  WHERE type = "brewery");
Nested Path Expressions in Subqueries

As mentioned in the FROM clause, the from-term of both parent and subqueries allow nested path expressions over constants and subqueries. However, only subqueries allow variable expressions (including paths), that are referenced through any variables defined in scope of the subquery. This is very powerful for language expressibility, simplicity and flexibility to N1QL queries. Especially, when combined with subqueries, nested path expressions over variables extend full power of N1QL syntax to array attributes/sub-documents without losing the structure of the array elements in results, or requiring tricky processing (with UNNEST, GROUP BY, ORDER BY and so). See the examples below.

Note: The usage of nested paths over keyspace identifiers is NOT allowed in the from-terms, and it results in a syntax error. Nested paths are always considered expressions in N1QL.

Examples

Example 1

The following query is valid because the nested path in the subquery is based on the explicit alias variable k1 in scope.
SELECT * FROM keyspace1 AS  k1 
WHERE (SELECT … FROM k1.subdoc1.subdoc2.field3 …)
Example 2

The following query is invalid and raise error because the nested path is in the outermost query:
SELECT * FROM keyspace1.subdoc1.subdoc2.field3 …
The subquery is based on the keyspace identifiers. Note that, the outer query has explicit alias defined, and hence the keyspace1/keyspace2 in subquery from-term are treated as identifiers.
SELECT * FROM keyspace1 AS  k1 
WHERE (SELECT … FROM keyspace1.subdoc1.field3 …)

SELECT * FROM keyspace1 AS  k1 
WHERE (SELECT … FROM keyspace2.subdoc2 …)
Example 3

The following example shows usage of nested path over subquery expression.
SELECT x.alt 
FROM (SELECT geo from `travel-sample` 
      WHERE type = "airport")[*].geo AS x
LIMIT 2;

[
  {
    "alt": 12
  },
  {
    "alt": 295
  }
]
Example 4

The following example shows usage of nested path over constant expression.
SELECT x 
FROM [{"a" : 1, "b" : {"c" : 2}},
      {"a" : 3, "b" : {"d" : 4}}][*].b AS x
LIMIT 2;

[
  {
    "x": {
      "c": 2
    }
  },
  {
    "x": {
      "d": 4
    }
  }
]
Example 5

The following two queries show valid and invalid examples with travel-sample data set. Note the nested paths (in bold) used in the FROM clause of the subquery.
Example Q8: Find airports that are at altitudes more than 4000ft

SELECT t1.city, t1.geo.alt
FROM `travel-sample` t1
WHERE t1.type = "airport" AND 
     (SELECT RAW t2.alt 
      FROM `travel-sample`.geo t2)[0] > 4000;
[
  {
    "code": 3000,
    "msg": "Ambiguous reference to field travel-sample.",
    "query_from_user": "SELECT t1.city, t1.geo.alt\nFROM `travel-sample` t1\nWHERE t1.type = \"airport\" AND \n(SELECT RAW t2.alt \n FROM `travel-sample`.geo t2)[0] > 4000;"
  }
]
Example Q8A: Q8 rewritten to use nested path using outer query variable t1

SELECT t1.city, t1.geo.alt
FROM `travel-sample` t1
WHERE t1.type = "airport" AND 
      (SELECT RAW t2.alt FROM t1.geo t2)[0] > 4000;
[
  {
    "alt": 6537,
    "city": "Grants"
  },
  {
    "alt": 5045,
    "city": "Prescott"
  }, ...
]
Example 6

The following example demonstrates the power of using nested path expressions in correlated subqueries over the array subdocument `travel-sample`.reviews. The example Q9 finds the top 10 hotels and number of reviewers which have Overall rating at least 4 and rated by minimum 6 people.
Query Q9:

SELECT name, cnt_reviewers
FROM `travel-sample` AS t 
LET cnt_reviewers = (SELECT raw count(*)
                     FROM t.reviews AS s 
                     WHERE s.ratings.Overall >= 4)[0] 
WHERE type = "hotel" and cnt_reviewers >= 6
ORDER BY cnt_reviewers DESC
LIMIT 10;

[
  {
    "cnt_reviewers": 9,
    "name": "Cadogan Hotel"  
  },
  {
    "cnt_reviewers": 9,
    "name": "Holiday Inn London Kensington Forum"
  },
  ...
]
Query Q9A:

SELECT name, cnt_reviewers
FROM   `travel-sample` AS t
LET cnt_reviewers = (SELECT raw count(*)
                     FROM `travel-sample` tmp
                     USE KEYS meta(t).id 
                     UNNEST tmp.reviews s                      WHERE s.ratings.Overall >= 4)[0] 
WHERE type = "hotel" and cnt_reviewers >= 6
ORDER BY cnt_reviewers DESC
LIMIT 10;
The above example Q9A is Couchbase Server 4.x equivalent of example Q6 that does not use nested paths in subquery FROM clause. In Couchbase Server version 4.x the FROM clause of the subquery can only have either keyspace identifier name or a subquery. Therefore, it requires a USE KEYS clause on the same document as in the outer query, that is meta(t).id where t refers to the outer query document. Nested paths cannot be used in the FROM clause and hence UNNEST is used on nested structures.