FROM clause

FROM clause

The FROM clause defines the keyspaces and input objects for the query.

This is an optional clause for your query. If this clause is omitted, the input for the query is a single empty object. You can perform calculations with the SELECT statement if the FROM clause is omitted.

Every FROM clause specifies one or more keyspaces. The first keyspace is called the primary keyspace.

FROM from-term

where from-term has the following syntax:

from-path [ ( [ AS ] alias ) ] [ use-keys-clause ] | 
from-term join-clause | 
from-term nest-clause | 
from-term unnest-clause   

from-path:

[ namespace : ] path 

namespace:

identifier

use-keys-clause:

USE [ PRIMARY ] KEYS expression 

join-clause:

( lookup-join | index-join )

lookup-join:

[ join-type ] JOIN from-path [ [ AS ] alias ] on-keys-clause

index-join:

[ join-type ] JOIN from-path [ [ AS ] alias ] ( on-keys-clause | on-key-for-clause )

join-type:

INNER | LEFT [ OUTER ]  

on-keys-clause:

ON [ PRIMARY ] KEYS expression  

on-key-for-clause:

ON [ PRIMARY ] KEY rhs-expression.lhs-expression-key FOR lhs-expression 

rhs-expression: keyspace or expression corresponding to the right hand side of JOIN.

lhs-expression: keyspace or expression corresponding to the left hand side of JOIN.

lhs-expression-key: attribute in rhs-expression referrencing primary key for lhs-expression.

nest-clause:

[ join-type ] NEST from-path [ ( [ AS ] alias ) ] on-keys-clause 

unnest-clause:

[ join-type ] [ UNNEST | FLATTEN ] expression [ ( [ AS ] alias ) ]

Omitted FROM clause

If the FROM clause is omitted, the data source is equivalent to an array containing a single empty object. This allows you to evaluate expressions that do not depend on stored data.

Evaluating an expression SELECT 10 + 20 produces the following result:

[ { "$1" : 30 } ] 

Counting the number of inputs SELECT COUNT(*) AS input_count produces the following result:

[ { "input_count" : 1 } ]

Getting the input contents SELECT * produces the following result:

[ { } ]

Keyspaces (buckets)

The simplest type of FROM clause specifies a keyspace (bucket):

SELECT * FROM customer

This returns every value in the customer keyspace.

The keyspace can be prefixed with an optional namespace (pool):

SELECT * FROM main:customer

This queries the customer keyspace in the main namespace.

If the namespace is omitted, the default namespace in the current session is used.

USE KEYS Clause

Specific primary keys within a keyspace (bucket) can be specified. Only values having those primary keys will be included as inputs to the query.

To specify a single key:

SELECT * FROM customer USE KEYS "acme-uuid-1234-5678"

To specify multiple keys:

SELECT * FROM customer USE KEYS [ "acme-uuid-1234-5678", "roadster-uuid-4321-8765" ]

In the FROM clause of a subquery, USE KEYS is mandatory for the primary keyspace.

Nested Paths

Nested paths within keyspaces can be specified using the period [.] as a level designation.

For each document in the keyspace, the path is evaluated and its value becomes an input to the query. For a given document, if any element of the path is NULL or MISSING, that document is skipped and does not contribute any inputs to the query.

For example, if some customer documents contain a primary_contact object, the following query can retrieve them:

SELECT * FROM customer.primary_contact

Here is the result set:

[
    {"name" : "John Smith", "phone" : "+1-650-555-1234", 
      "address" : { ... } },
    {"name" : "Jane Brown", "phone" : "+1-650-555-5678", 
      "address" : { ... } }     
]  

Nested paths can have arbitrary depth and can include array subscripts. For example, the following query:

SELECT * FROM customer.primary_contact.address

returns this result set:

[
  { "street" : "101 Main St.", "zip" : "94040" },         
  { "street" : "3500 Wilshire Blvd.", "zip" : "90210" }     
] 

JOIN Clause

The JOIN clause enables you to create new input objects by combining two or more source objects. N1QL supports two types of joins: lookup joins and index joins.

Lookup Joins

Lookup joins allow only left-to-right JOINs, which means each qualified document from the left hand side (LHS) of the JOIN operator is required to produce primary keys of documents on the right hand side (RHS). These keys are subsequently nested-loop-joined to access qualified RHS documents. Couchbase Server version 4.1 and earlier supported only lookup joins.

The joins-clause is optional, and follows the FROM clause; it allows you to combine two or more source objects to use as input objects. The KEYS clause is required after each JOIN. It specifies the primary keys for the second keyspace in the join.

Here is the syntax for the lookup join clause:

[ join-type ] JOIN from-path [ [ AS ] alias ] keys-clause

where join-type [ LEFT ] is [ INNER | OUTER ] and from-path is as discussed in the "from-path" section.

Lookup joins can be chained. By default, an INNER join is performed. This means that for each joined object produced, both the left- and right-hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer join is performed. At least one joined object is produced for each left-hand source object. If the right-hand source object is NULL or MISSING, then the joined object's right-hand side value is also NULL or MISSING (omitted), respectively.

The KEYS clause is required after each JOIN. It specifies the primary keys for the second keyspace in the join.

For example, if our customer objects were:
  {
     "name": ...,
     "primary_contact": ...,
     "address": [ ... ]
     }          
And our invoice objects were:
   {
      "customer_key": ...,
      "invoice_date": ...,
      "invoice_item_keys": [ ... ],
      "total": ...
      }
And the FROM clause was:
FROM invoice inv JOIN customer cust ON KEYS inv.customer_key     
Then each joined object would be:
    {
        "inv" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "cust" : {
            "name": ...,
            "primary_contact": ...,
            "address": [ ... ]
        }
        }      
If our invoice_item objects were:
   {
        "invoice_key": ...,
        "product_key": ...,
        "unit_price": ...,
        "quantity": ...,
        "item_subtotal": ...
        }          
And the FROM clause was:
FROM invoice JOIN invoice_item item ON KEYS invoice.invoice_item_keys
Then our joined objects would be:
   {
        "invoice" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "item" : {
            "invoice_key": ...,
            "product_key": ...,
            "unit_price": ...,
            "quantity": ...,
            "item_subtotal": ...
        }
    },
    {
        "invoice" : {
            "customer_key": ...,
            "invoice_date": ...,
            "invoice_item_keys": [ ... ],
            "total": ...
        },
        "item" : {
            "invoice_key": ...,
            "product_key": ...,
            "unit_price": ...,
            "quantity": ...,
            "item_subtotal": ...
        }
    },
    ...

ON KEYS is required after each JOIN. It specifies the primary keys for the second keyspace (bucket) in the join.

Joins can be chained.

By default, an INNER join is performed. This means that for each joined object produced, both the left and right hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer join is performed. At least one joined object is produced for each left hand source object. If the right hand source object is NULL or MISSING, then the joined object's right-hand side value is also NULL or MISSING (omitted), respectively.

Index Joins

When using lookup joins, right-to-left JOINs with RHS documents containing primary key references to LHS documents cannot be joined efficiently using any index. For example, consider the beer-sample data with beer and brewery documents, where beer.brewery_id is the primary key of brewery documents and brewery documents have no reference to beer documents. The following query to get a list of beers from brewers in California cannot be efficiently executed without making a Cartesian product of all beer documents (LHS) with all brewery documents (RHS):
SELECT * FROM `beer-sample` beer JOIN `beer-sample` brewery ON KEYS beer.brewery_id WHERE beer.type="beer" AND brewery.type="brewery" AND brewery.state="California";
This query cannot use any index on brewery to directly access breweries in California because "brewery" is on the RHS. Also, you cannot rewrite the query to put the brewery document on the LHS (to use any index) and the beer document on the RHS because the brewery documents (on the LHS) have no primary keys to access the beer documents (on the RHS).
Using the new index joins, the same query can be written as:
Required Index:
CREATE INDEX beer_brewery_id ON `beer-sample`(brewery_id) WHERE type="beer";
                 
Optional index:
CREATE INDEX brewery_state ON `beer-sample`(state) WHERE type="brewery";
                 
SELECT * FROM  `beer-sample` brewery JOIN `beer-sample` beer 
    ON KEY  beer.brewery_id  FOR brewery 
    WHERE  beer.type="beer" AND brewery.type="brewery" AND brewery.state="California";
If you generalize the same query, it looks like the following:
CREATE INDEX on-key-for-index-name rhs-expression (lhs-expression-key);
                 
SELECT projection-list FROM lhs-expression JOIN rhs-expression
    ON KEY rhs-expression.lhs-expression-key FOR lhs-expression      
   [ WHERE predicates ] ; 
There are three important changes in the index scan syntax example above:
  • CREATE INDEX on the ON KEY expression beer_brewery_id to access beer documents using brewery_id (which are produced on the LHS).
  • The ON KEY beer.brewery_id FOR brewery. This enables N1QL to use the index beer_brewery_id.
  • Create any optional index such as brewery.state that can be used on brewery (LHS).
Note: For index joins, the syntax uses ON KEY (singular) instead of ON KEYS (plural). This is because for index joins, the ON KEY expression must produce a scalar value. For lookup joins, the ON KEYS expression can produce either a scalar or an array value.

Syntax of Index Joins

A new on-key-for-clause is added to the join-clause of the FROM syntax.

[ join-type ] JOIN from-path [ [ AS ] alias ] < on-keys-clause | on-key-for-clause >

on-keys-clause:

ON [ PRIMARY ] KEYS expression 

on-key-for-clause

ON [ PRIMARY ] KEY rhs-expression.lhs-expression-key FOR lhs-expression 

rhs-expression: keyspace or expression corresponding to the right hand side of JOIN.

lhs-expression: keyspace or expression corresponding to the left hand side of JOIN.

lhs-expression-key: attribute in rhs-expression referrencing primary key for lhs-expression.

Example

The following example shows the statement to fetches the user name from a given document. In this example, the documents doc1 and doc2 are in the same bucket and contain the following information:

Doc1 document name = message1
[
  {
    "uid": "1",
    "message": "hello",
    "sent_by": "username"
  }
]
Doc2 document name = user1

[
  {
    "username": "username"
  }
  ]   

Assuming that the name of Doc2 is based on the uid of Doc1, that is, "user1" is based on "uid=1", you can use the following statement to fetch the user name:

JOIN statement

SELECT d1.uid, d1.message, d1.sent_by, d2.username
FROM mybucket d1 USE KEYS "message1"
JOIN mybucket d2 ON KEYS "user" || d1.uid;                   

UNNEST Clause

If a document or object contains a nested array, UNNEST conceptually performs a join of the nested array with its parent object. Each resulting joined object becomes an input to the query. Unnests can be chained.

Here is the syntax for an UNNEST join:

[ join-type ] UNNEST path [ [ AS ] alias ]

where join-type is [ INNER | LEFT [ OUTER ] ]

The first path element after each UNNEST must reference some preceding path.

By default, an INNER unnest is performed. This means that for each result object produced, both the left-hand and right-hand source objects must be non-missing and non-null.

If LEFT or LEFT OUTER is specified, then a left outer unnest is performed. At least one result object is produced for each left source object. If the right-hand source object is NULL, MISSING, empty, or a non-array value, then the result object's right side value is MISSING (omitted).

Example

If some customer documents contain an array of addresses under the address field, the following query retrieves each nested address along with the parent customer's name.

SELECT c.name, a.* FROM customer c UNNEST c.address a 

Here is the result set:

 [
    { "name" : "Acme Inc.", "street" : "101 Main St.", 
        "zip" : "94040" },
    { "name" : "Acme Inc.", "street" : "300 Broadway", 
	"zip" : "10011" },
    { "name" : "Roadster Corp.", "street" : "3500 Wilshire Blvd.",
        "zip" : "90210" },
    { "name" : "Roadster Corp.", "street" : "4120 Alamo Dr.", 
	"zip" : "75019" }     
] 

In the following example, The UNNEST clause iterates over the reviews array and collects the reviewerName and publication from each element in the array. This collection of objects can be used as input for other query operations.

SELECT review.reviewerName, review.publication
   FROM beers AS b
      UNNEST review IN b.reviews   

Here is the result set:

{"id": "7983345",
 "name": "Takayama Pale Ale",
 "brewer": "Hida Takayama Brewing Corp.",
 "reviews" : [ 
   {"reviewerName" : "Takeshi Kitano",
         "publication" : "Outdoor Japan Magazine","date": "3/2013"},
   {"reviewerName" : "Moto Ohtake", "publication" : "Japan Beer Times",
         "date" : "7/2013"} 
            ]
}         

NEST Clause

[ join-type ] NEST from-path [ [ AS ] alias ] keys-clause

where join-type is [ INNER | LEFT [ OUTER ] ]

Nesting is conceptually the inverse of unnesting. Nesting performs a join across two keyspaces. But instead of producing a cross-product of the left and right inputs, a single result is produced for each left input, while the corresponding right inputs are collected into an array and nested as a single array-valued field in the result object.

Nests can be chained with other NEST, JOIN, and UNNEST clauses. By default, an INNER nest is performed. This means that for each result object produced, both the left and right source objects must be non-missing and non-null. The right-hand side result of NEST is always an array or MISSING. If there is no matching right source object, then the right source object is as follows:

  • If the ON KEYS expression evaluates to MISSING, the right value is also MISSING.
  • If the ON KEYS expression evaluates to NULL, the right value is MISSING.
  • If the ON KEYS expression evaluates to an array, the right value is an empty array.
  • If the ON KEYS expression evaluates to a non-array value, the right value is an empty array.

If LEFT or LEFT OUTER is specified, then a left outer nest is performed. One result object is produced for each left source object.

Example

This example shows the NEST clause using invoice and invoice_item_ objects.

Recall our invoice objects:

{
     "customer_key": ...,
     "invoice_date": ...,
     "invoice_item_keys": [ ... ],
     "total": ...     
 }  

And our invoice_item objects:

{
         "invoice_key": ...,
         "product_key": ...,
         "unit_price": ...,
         "quantity": ...,
         "item_subtotal": ...     
}  

If the FROM clause was:

FROM invoice inv NEST invoice_item items ON KEYS inv.invoice_item_keys 

The results would be:

  {
         "invoice" : {
             "customer_key": ...,
             "invoice_date": ...,
             "invoice_item_keys": [ ... ],
             "total": ...
         },
         "items" : [
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             },
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             }
         ]
     },
     {
         "invoice" : {
             "customer_key": ...,
             "invoice_date": ...,
             "invoice_item_keys": [ ... ],
             "total": ...
         },
         "items" : [
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             },
             {
                 "invoice_key": ...,
                 "product_key": ...,
                 "unit_price": ...,
                 "quantity": ...,
                 "item_subtotal": ...
             }
         ]
     },
     ... 

Arrays

If an array occurs along a path, you can use array subscripts to select one element.

In the following statement, the entire address array is selected for each customer.

SELECT a FROM customer.address a;    
[
    {
        "a": [
                 { "street" : "101 Main St.", "zip" : "94040" },
                 { "street" : "300 Broadway", "zip" : "10011" }
             ]
    },
    {
        "a": [
                { "street" : "3500 Wilshire Blvd.", "zip" : "90210" },
                { "street" : "4120 Alamo Dr.", "zip" : "75019" }
             ]
    }
] 

The following example uses array subscripts to select the first element of the address array for each customer.

SELECT * FROM customer.address[0]  
[
     { "street" : "101 Main St.", "zip" : "94040" },
     { "street" : "3500 Wilshire Blvd.", "zip" : "90210" }
]  

AS Keyword

Like SQL, N1QL allows renaming fields using the AS keyword. However, N1QL also allows reshaping the data, which has no analog in SQL. To do this, you embed the attributes of the statement in the desired result object shape.

Aliases

Aliases in the FROM clause create new names that can be referred to anywhere in the query. When an alias conflicts with a keyspace or field name in the same scope, the identifier always refers to the alias. This allows for consistent behavior in scenarios where an identifier only conflicts in some documents. For more information on aliases, see Identifiers.