FROM clause

FROM clause

The FROM clause of a SELECT query or subquery defines the keyspaces and the source of input documents or objects for the query. Every FROM clause specifies one or more keyspaces. The first keyspace is called the primary keyspace. This is an optional clause for your query. If this clause is omitted, the input for the query is a single empty object. You can perform calculations with the SELECT statement if the FROM clause is omitted.

The from-term defines the input objects for the query and it can be either a keyspace identifier or an expression. When using nested subqueries, the from-term of the outermost parent query can only have a keyspace identifier, a subquery, or a constant expression since the from-term needs to independently produce input documents for the query. However, the subqueries can use generic variable expressions, which may be dependent on the Variables in Scope of a Subquery. For more details, see Subqueries.

Note: Couchbase Server version 4.x supports only keyspace identifier or a subquery in the from-term, but not expressions. Couchbase Server version 4.6.2 adds support for generic expression in the from-term.



FROM from-term

where from-term has the following syntax:


from-term::= keyspace-ref [ [ AS ] alias ] [ use-keys-clause ]  
                        | “(” select “)” [ AS ] alias
                        | expr [ AS ] alias
                        | from-term ( join-clause | nest-clause | unnest-clause )   

from-path:

[ namespace : ] path 

namespace:

identifier

expr:

see N1QL expression

use-keys-clause:

USE [ PRIMARY ] KEYS expression 

join-clause:

( lookup-join | index-join )

lookup-join:

[ join-type ] JOIN from-path [ [ AS ] alias ] on-keys-clause

index-join:

[ join-type ] JOIN from-path [ [ AS ] alias ] ( on-keys-clause | on-key-for-clause )

join-type:

INNER | LEFT [ OUTER ]  

on-keys-clause:

ON [ PRIMARY ] KEYS expression  

on-key-for-clause:

ON [ PRIMARY ] KEY rhs-expression.lhs-expression-key FOR lhs-expression 

rhs-expression: keyspace or expression corresponding to the right hand side of JOIN.

lhs-expression: keyspace or expression corresponding to the left hand side of JOIN.

lhs-expression-key: attribute in rhs-expression referrencing primary key for lhs-expression.

nest-clause:

[ join-type ] NEST from-path [ ( [ AS ] alias ) ] on-keys-clause 

unnest-clause:

[ join-type ] [ UNNEST | FLATTEN ] expression [ ( [ AS ] alias ) ]

Omitted FROM clause

If the FROM clause is omitted, the data source is equivalent to an array containing a single empty object. This allows you to evaluate expressions that do not depend on stored data.

Evaluating an expression SELECT 10 + 20 produces the following result:

[ { "$1" : 30 } ] 

Counting the number of inputs SELECT COUNT(*) AS input_count produces the following result:

[ { "input_count" : 1 } ]

Getting the input contents SELECT * produces the following result:

[ { } ]

Keyspaces (buckets)

The simplest type of FROM clause specifies a keyspace (bucket):

SELECT * FROM customer

This returns every value in the customer keyspace.

The keyspace can be prefixed with an optional namespace (pool):

SELECT * FROM main:customer

This queries the customer keyspace in the main namespace.

If the namespace is omitted, the default namespace in the current session is used.

Keyspace Identifier

This is the name or identifier of an independent keyspace that can serve as a data source or bucket of one or more documents. Such keyspace is not dependent on any of the Variables in Scope of a Subquery.

The following example Q1 uses the keyspace `travel-sample`.
Example Q1:

SELECT t1.city
FROM `travel-sample` t1
WHERE t1.type = "landmark" LIMIT 4;

The following non-correlated subquery example Q2 uses the keyspace `travel-sample` independent of the same keyspace used in the outer query:

Example Q2: Find cities that have landmarks and airports.

SELECT t1.city
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN (SELECT RAW city
                  FROM `travel-sample` 
                  WHERE type = "airport");

The following correlated subquery Q2A uses keyspace alias from outer query.

Example Q2A: Find cities that have museum landmarks.

SELECT t1.city, t1.name
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      (SELECT raw t2
       FROM split(t1.name) t2 
       WHERE t2 = "museum")[0] is not null;

The following subquery example Q2B uses different keyspaces in outer/inner queries.

Example Q2B: Find cities that have landmarks and breweries.

SELECT t1.city
FROM `travel-sample` t1
WHERE t1.type = "landmark" AND 
      t1.city IN (SELECT RAW b1.city
                  FROM `beer-sample` b1
                  WHERE b1.type = "brewery");

N1QL Expression

Couchbase Server version 4.6.2 adds support for generic expressions in the from-term. This is a very powerful functionality as it enables usage of various N1QL functions, operators, path expressions, language constructs on constant expressions, variables, and subqueries.
  • When the from-term is an expression, USE KEYS or USE INDEX clauses are not allowed.

  • When using JOIN clause, NEST clause, or UNNEST clause, the left side keyspace can be an expression or subquery, but the right side keyspace must be a keyspace identifier.

    Independent Constant Expression
    This includes any N1QL expressions of JSON scalar values, static JSON literals, objects, or N1QL functions. For example:
    SELECT * FROM [1, 2, "name", { "type" : "airport", "id" : "SFO"}]  AS  ks1
    Variable N1QL Expression
    This includes expressions that refers to any variables in scope for the query. This is applicable to only subqueries because the outermost level query cannot use any variables in its own FROM clause. This makes the subquery correlated with outer queries, as explained in the Subqueries section.
    Subquery and Subquery Expressions
    A subquery itself can appear as from-term expression. In this case, the subquery results are fed as data source to the outer query. Further, subqueries can occur as a subquery, or as constituent part of a bigger N1QL expression.

    For example, the following example Q3 shows a simple case of using subquery in FROM clause, though N1QL is powerful enough to express it without a subquery (as shown in example Q3A):

    Example Q3: Find highest altitude airports/cities in each country
     
    SELECT t1.country, t1.max_country_alt, 
           ARRAY x.city FOR x IN t1.c 
           WHEN x.alt = t1.max_country_alt END
    FROM (SELECT  country, array_agg({"alt": geo.alt , city}) c,
       max_country_alt
          FROM `travel-sample` 
          WHERE type = "airport"
          GROUP BY country 
    LETTING max_country_alt = max(geo.alt) ) t1;
    Example Q3A: Query Q3 without using subqueries
    
    SELECT  country, 
            (ARRAY x.city FOR x IN array_agg({"alt": geo.alt, city}) 
            WHEN x.alt = max_country_alt END) AS max_alt_city,
            max_country_alt
    FROM `travel-sample` 
    WHERE type = "airport"
    GROUP BY country 
    LETTING max_country_alt = max(geo.alt);
    A more realistic example of needing subquery in the FROM clause arises in more complicated scenarios such as when multiple levels of aggregates (sort orders and limits) are required, or when the subquery results of one keyspace may need to be JOINed with another keyspace.
    For example, the following example Q4 finds for each country, total number of airports at different altitudes, and the total number of corresponding cities with those airports. In this case, the inner query finds first level of grouping of different altitudes by country, and corresponding number of cities. Then the outer query builds on the inner query results to count number of different altitude groups for each country, and the total number of cities.
    Example Q4:
    
    SELECT t1.country, num_alts, total_cities
    FROM (SELECT country, geo.alt AS alt,
                 count(city) AS num_cities
          FROM `travel-sample` 
          WHERE type = "airport"
          GROUP BY country, geo.alt) t1
    GROUP BY t1.country 
    LETTING num_alts = count(t1.alt), total_cities = sum(t1.num_cities);
    
    [
      {
        "country": "United Kingdom",
        "num_alts": 128,
        "total_cities": 187
      },
      {
        "country": "France",
        "num_alts": 196,
        "total_cities": 221
      },
      {
        "country": "United States",
        "num_alts": 946,
        "total_cities": 1560
      }
    ]
    This is equivalent to blending the results of the following two queries by country, but the subquery in the from-term simplified it above.
    SELECT country,count(city) AS num_cities
    FROM `travel-sample` 
    WHERE type = "airport"
    GROUP BY country;
    
    SELECT country, count(distinct geo.alt) AS num_alts
    FROM `travel-sample` 
    WHERE type = "airport"
    GROUP BY country;
    See Subqueries for details and examples.
    Nested Path Expressions
    Expressions used in the from-term can have nested paths, including constant or variable or subquery expressions. Similarly, variable expressions are allowed only subqueries and not in outermost parent queries.
    Further, the nested path variable expression in subquery from-term must resolve to variables/aliases in scope, and not to any keyspace/bucket identifiers. Otherwise, it results in a syntax error. Note that, when an expression cannot be resolved to any variables in scope, it is considered keyspace identifier. See Nested Paths in Subqueries for more details.

    The following example shows usage of nested path over subquery expression:

    SELECT x.alt 
    FROM (SELECT geo from `travel-sample` 
          WHERE type = "airport")[*].geo AS x
    LIMIT 2;
    
    [
      {
        "alt": 12
      },
      {
        "alt": 295
      }
    ]
    The following example shows usage of nested path over constant expression:
    SELECT x FROM 
             [{"a" : 1, "b" : {"c" : 2}},
              {"a" : 3, "b" : {"d" : 4}}][*].b AS x
    LIMIT 2;
    
    [
      {
        "x": {
          "c": 2
        }
      },
      {
        "x": {
          "d": 4
        }
      }
    ]

USE KEYS Clause

Specific primary keys within a keyspace (bucket) can be specified. Only values having those primary keys will be included as inputs to the query.

To specify a single key:

SELECT * FROM customer USE KEYS "acme-uuid-1234-5678"

To specify multiple keys:

SELECT * FROM customer USE KEYS [ "acme-uuid-1234-5678", "roadster-uuid-4321-8765" ]

In the FROM clause of a subquery, USE KEYS is mandatory for the primary keyspace.

JOIN Clause

The JOIN clause enables you to create new input objects by combining two or more source objects. N1QL supports two types of joins: lookup joins and index joins.

Lookup Joins

Lookup joins allow only left-to-right JOINs, which means each qualified document from the left hand side (LHS) of the JOIN operator is required to produce primary keys of documents on the right hand side (RHS). These keys are subsequently nested-loop-joined to access qualified RHS documents. Couchbase Server version 4.1 and earlier supported only lookup joins.

The joins-clause is optional, and follows the FROM clause; it allows you to combine two or more source objects to use as input objects. The KEYS clause is required after each JOIN. It specifies the primary keys for the second keyspace in the join.

Here is the syntax for the lookup join clause:

[ join-type ] JOIN from-path [ [ AS ] alias ] keys-clause

where join-type [ LEFT ] is [ INNER | OUTER ] and from-path is as discussed in the "from-path" section.

Lookup joins can be chained. By default, an INNER join is performed. This means that for each joined object produced, both the left- and right-hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer join is performed. At least one joined object is produced for each left-hand source object. If the right-hand source object is NULL or MISSING, then the joined object's right-hand side value is also NULL or MISSING (omitted), respectively.

The KEYS clause is required after each JOIN. It specifies the primary keys for the second keyspace in the join.

For example, if our customer objects were:
  {
     "name": ...,
     "primary_contact": ...,
     "address": [ ... ]
     }          
And our invoice objects were:
   {
      "customer_key": ...,
      "invoice_date": ...,
      "invoice_item_keys": [ ... ],
      "total": ...
      }
And the FROM clause was:
FROM invoice inv JOIN customer cust ON KEYS inv.customer_key     
Then each joined object would be:
    {
        "inv" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "cust" : {
            "name": ...,
            "primary_contact": ...,
            "address": [ ... ]
        }
        }      
If our invoice_item objects were:
   {
        "invoice_key": ...,
        "product_key": ...,
        "unit_price": ...,
        "quantity": ...,
        "item_subtotal": ...
        }          
And the FROM clause was:
FROM invoice JOIN invoice_item item ON KEYS invoice.invoice_item_keys
Then our joined objects would be:
   {
        "invoice" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "item" : {
            "invoice_key": ...,
            "product_key": ...,
            "unit_price": ...,
            "quantity": ...,
            "item_subtotal": ...
        }
    },
    {
        "invoice" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "item" : {
            "invoice_key": ...,
            "product_key": ...,
            "unit_price": ...,
            "quantity": ...,
            "item_subtotal": ...
        }
    },
    ...

ON KEYS is required after each JOIN. It specifies the primary keys for the second keyspace (bucket) in the join.

Joins can be chained.

By default, an INNER join is performed. This means that for each joined object produced, both the left and right hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer join is performed. At least one joined object is produced for each left hand source object. If the right hand source object is NULL or MISSING, then the joined object's right-hand side value is also NULL or MISSING (omitted), respectively.

Index Joins

When using lookup joins, right-to-left JOINs with RHS documents containing primary key references to LHS documents cannot be joined efficiently using any index. For example, consider the beer-sample data with beer and brewery documents, where beer.brewery_id is the primary key of brewery documents and brewery documents have no reference to beer documents. The following query to get a list of beers from brewers in California cannot be efficiently executed without making a Cartesian product of all beer documents (LHS) with all brewery documents (RHS):
SELECT * FROM `beer-sample` beer JOIN `beer-sample` brewery ON KEYS beer.brewery_id WHERE beer.type="beer" AND brewery.type="brewery" AND brewery.state="California";
This query cannot use any index on brewery to directly access breweries in California because "brewery" is on the RHS. Also, you cannot rewrite the query to put the brewery document on the LHS (to use any index) and the beer document on the RHS because the brewery documents (on the LHS) have no primary keys to access the beer documents (on the RHS).
Using the new index joins, the same query can be written as:
Required Index:
CREATE INDEX beer_brewery_id ON `beer-sample`(brewery_id) WHERE type="beer";
                 
Optional index:
CREATE INDEX brewery_state ON `beer-sample`(state) WHERE type="brewery";
                 
SELECT * FROM  `beer-sample` brewery JOIN `beer-sample` beer 
    ON KEY  beer.brewery_id  FOR brewery 
    WHERE  beer.type="beer" AND brewery.type="brewery" AND brewery.state="California";
If you generalize the same query, it looks like the following:
CREATE INDEX on-key-for-index-name rhs-expression (lhs-expression-key);
                 
SELECT projection-list FROM lhs-expression JOIN rhs-expression
    ON KEY rhs-expression.lhs-expression-key FOR lhs-expression      
   [ WHERE predicates ] ; 
There are three important changes in the index scan syntax example above:
  • CREATE INDEX on the ON KEY expression beer_brewery_id to access beer documents using brewery_id (which are produced on the LHS).
  • The ON KEY beer.brewery_id FOR brewery. This enables N1QL to use the index beer_brewery_id.
  • Create any optional index such as brewery.state that can be used on brewery (LHS).
Note: For index joins, the syntax uses ON KEY (singular) instead of ON KEYS (plural). This is because for index joins, the ON KEY expression must produce a scalar value. For lookup joins, the ON KEYS expression can produce either a scalar or an array value.

Syntax of Index Joins

A new on-key-for-clause is added to the join-clause of the FROM syntax.

[ join-type ] JOIN from-path [ [ AS ] alias ] < on-keys-clause | on-key-for-clause >

on-keys-clause:

ON [ PRIMARY ] KEYS expression 

on-key-for-clause

ON [ PRIMARY ] KEY rhs-expression.lhs-expression-key FOR lhs-expression 

rhs-expression: keyspace or expression corresponding to the right hand side of JOIN.

lhs-expression: keyspace or expression corresponding to the left hand side of JOIN.

lhs-expression-key: attribute in rhs-expression referrencing primary key for lhs-expression.

Example

The following example shows the statement to fetches the user name from a given document. In this example, the documents doc1 and doc2 are in the same bucket and contain the following information:

Doc1 document name = message1
[
  {
    "uid": "1",
    "message": "hello",
    "sent_by": "username"
  }
]
Doc2 document name = user1

[
  {
    "username": "username"
  }
  ]   

Assuming that the name of Doc2 is based on the uid of Doc1, that is, "user1" is based on "uid=1", you can use the following statement to fetch the user name:

JOIN statement

SELECT d1.uid, d1.message, d1.sent_by, d2.username
FROM mybucket d1 USE KEYS "message1"
JOIN mybucket d2 ON KEYS "user" || d1.uid;                   

UNNEST Clause

If a document or object contains a nested array, UNNEST conceptually performs a join of the nested array with its parent object. Each resulting joined object becomes an input to the query. Unnests can be chained.

Here is the syntax for an UNNEST join:

[ join-type ] UNNEST path [ [ AS ] alias ]

where join-type is [ INNER | LEFT [ OUTER ] ]

The first path element after each UNNEST must reference some preceding path.

By default, an INNER unnest is performed. This means that for each result object produced, both the left-hand and right-hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer unnest is performed. At least one result object is produced for each left source object. If the right-hand source object is NULL, MISSING, empty, or a non-array value, then the result object's right side value is MISSING (omitted).

Example

If some customer documents contain an array of addresses under the address field, the following query retrieves each nested address along with the parent customer's name.

SELECT c.name, a.* FROM customer c UNNEST c.address a 

Here is the result set:

 [
    { "name" : "Acme Inc.", "street" : "101 Main St.", 
        "zip" : "94040" },
    { "name" : "Acme Inc.", "street" : "300 Broadway", 
	"zip" : "10011" },
    { "name" : "Roadster Corp.", "street" : "3500 Wilshire Blvd.",
        "zip" : "90210" },
    { "name" : "Roadster Corp.", "street" : "4120 Alamo Dr.", 
	"zip" : "75019" }     
] 

In the following example, The UNNEST clause iterates over the reviews array and collects the reviewerName and publication from each element in the array. This collection of objects can be used as input for other query operations.

SELECT review.reviewerName, review.publication
   FROM beers AS b
      UNNEST review IN b.reviews   

Here is the result set:

{"id": "7983345",
 "name": "Takayama Pale Ale",
 "brewer": "Hida Takayama Brewing Corp.",
 "reviews" : [ 
   {"reviewerName" : "Takeshi Kitano",
         "publication" : "Outdoor Japan Magazine","date": "3/2013"},
   {"reviewerName" : "Moto Ohtake", "publication" : "Japan Beer Times",
         "date" : "7/2013"} 
            ]
}         

NEST Clause

[ join-type ] NEST from-path [ [ AS ] alias ] keys-clause

where join-type is [ INNER | LEFT [ OUTER ] ]

Nesting is conceptually the inverse of unnesting. Nesting performs a join across two keyspaces. But instead of producing a cross-product of the left and right inputs, a single result is produced for each left input, while the corresponding right inputs are collected into an array and nested as a single array-valued field in the result object.

Nests can be chained with other NEST, JOIN, and UNNEST clauses. By default, an INNER nest is performed. This means that for each result object produced, both the left and right source objects must be non-missing and non-null. The right-hand side result of NEST is always an array or MISSING. If there is no matching right source object, then the right source object is as follows:

  • If the ON KEYS expression evaluates to MISSING, the right value is also MISSING.
  • If the ON KEYS expression evaluates to NULL, the right value is MISSING.
  • If the ON KEYS expression evaluates to an array, the right value is an empty array.
  • If the ON KEYS expression evaluates to a non-array value, the right value is an empty array.

If LEFT or LEFT OUTER is specified, then a left outer nest is performed. One result object is produced for each left source object.

Example

This example shows the NEST clause using invoice and invoice_item_ objects.

Recall our invoice objects:

{
     "customer_key": ...,
     "invoice_date": ...,
     "invoice_item_keys": [ ... ],
     "total": ...     
 }  

And our invoice_item objects:

{
         "invoice_key": ...,
         "product_key": ...,
         "unit_price": ...,
         "quantity": ...,
         "item_subtotal": ...     
}  

If the FROM clause was:

FROM invoice inv NEST invoice_item items ON KEYS inv.invoice_item_keys 

The results would be:

  {
         "invoice" : {
             "customer_key": ...,
             "invoice_date": ...,
             "invoice_item_keys": [ ... ],
             "total": ...
         },
         "items" : [
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             },
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             }
         ]
     },
     {
         "invoice" : {
             "customer_key": ...,
             "invoice_date": ...,
             "invoice_item_keys": [ ... ],
             "total": ...
         },
         "items" : [
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             },
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             }
         ]
     },
     ... 

Arrays

If an array occurs along a path, you can use array subscripts to select one element.

In the following statement, the entire address array is selected for each customer.

SELECT a FROM customer.address a;    
[
    {
        "a": [
                 { "street" : "101 Main St.", "zip" : "94040" },
                 { "street" : "300 Broadway", "zip" : "10011" }
             ]
    },
    {
        "a": [
                { "street" : "3500 Wilshire Blvd.", "zip" : "90210" },
                { "street" : "4120 Alamo Dr.", "zip" : "75019" }
             ]
    }
] 

The following example uses array subscripts to select the first element of the address array for each customer.

SELECT * FROM customer.address[0]  
[
     { "street" : "101 Main St.", "zip" : "94040" },
     { "street" : "3500 Wilshire Blvd.", "zip" : "90210" }
]  

AS Keyword

Like SQL, N1QL allows renaming fields using the AS keyword. However, N1QL also allows reshaping the data, which has no analog in SQL. To do this, you embed the attributes of the statement in the desired result object shape.

Aliases

Aliases in the FROM clause create new names that can be referred to anywhere in the query. When an alias conflicts with a keyspace or field name in the same scope, the identifier always refers to the alias. This allows for consistent behavior in scenarios where an identifier only conflicts in some documents. For more information on aliases, see Identifiers.