Sub-Document Operations
Edit this article in GitHub
Version 2.2

Sub-Document Operations

sub-document operations can be used to efficiently access parts of documents. Sub-document operations may be quicker and more network-efficient than full-document operations such as upsert, update and get because they only transmit the accessed sections of the document over the network. Sub-document operations are also atomic, allowing safe modifications to documents with built-in concurrency control.

Sub-documents

Starting with Couchbase Server 4.5 you can atomically and efficiently update and retrieve parts of a document. These parts are called sub documents. While full-document retrievals retrieve the entire document and full document updates require sending the entire document, sub-document retrievals only retrieve relevant parts of a document and sub-document updates only require sending the updated portions of a document. You should use sub-document operations when you are modifying only portions of a document, and full-document operations when the contents of a document is to change significantly.
Note: In the case of concurrent sub-document API operations from multiple clients, Couchbase Server does not honor the atomicity ( MB-21597). This can result in one update overwriting the previous update, even though the previous update was successful as far as the client is concerned. Workaround: For use cases where sub-document API is being used and are covered by an Enterprise Edition license, contact Couchbase Support. For all other cases, Couchbase recommends that you do not use the sub-document API feature for concurrent updates.
In order to use sub-document operations you need to specify a path indicating the location of the subdocument. The path follows N1QL syntax. Considering the document
customer123.json
{
  "name": "Douglas Reynholm",
  "email": "douglas@reynholmindustries.com",
  "addresses": {
    "billing": {
      "line1": "123 Any Street",
      "line2": "Anytown",
      "country": "United Kingdom"
    },
    "delivery": {
      "line1": "123 Any Street",
      "line2": "Anytown",
      "country": "United Kingdom"
    }
  },
  "purchases": {
    "complete": [
      339, 976, 442, 666
    ],
    "abandoned": [
      157, 42, 999
    ]
  }
}
the paths name, addresses.billing.country and purchases.complete[0] are all valid paths.

Retrieving

The lookup-in operations queries the document for a certain path(s) and returns that/those path(s). You have a choice of actually retrieving the document path using the subdoc-get subdocument operation, or simply querying the existence of the path using the subdoc-exists subdocument operation. The latter saves even more bandwidth by not retrieving the contents of the path if it is not needed.
Retrieve sub-document value
import couchbase.subdocument as SD  # Use SD alias for brevity
rv = bucket.lookup_in('customer123', SD.get('addresses.delivery.country'))
country = rv[0] # => 'United Kingdom'
Check existence of sub-document path
bucket.lookupIn('customer123').exists('purchases.pending[-1]').execute(
    function(err, result) {
        console.log('Path exists? %j', result.exists('purchases.pending[-1]'));
    }
);
# Path exists? false
Multiple operations can be combined as well:
Combine multiple lookup operations
$frags = $bucket->lookupIn('customer123')
    ->get('addresses.delivery.country')
    ->exists('purchases.pending[-1]')
    ->execute();

echo $frags->value[0]['value'] . ", Code=" . $frags->value[0]['code'] . "\n";
echo $frags->value[1]['value'] . ", Code=" . $frags->value[1]['code'] . "\n";
# United Kingdom, 0 (No error)
# , 63 (COUCHBASE_SUBDOC_PATH_ENOENT)

Mutating

Mutation operations modify one or more paths in the document. The simplest of these operations is subdoc-upsert, which, just like the fulldoc-level upsert, this will either modify the value of an existing path or create it if it does not exist:

upserting a new sub-document
bucket.mutate_in('customer123', SD.upsert('fax', '775-867-5309'))
Likewise, the subdoc-insert operation will only add the new value to the path if it does not exist
inserting a sub-document
bucket.mutate_in('customer123', SD.insert('purchases.complete', [42, True, None]))
# SubdocPathExistsError
Dictionary values can also be replaced or removed, and you may combine any number of mutation operations within the same general mutate-in API. Here's an example of one which replaces one path and removes another.
bucket.mutate_in('customer123',
                 SD.remove('addresses.billing'),
                 SD.replace('email', 'doug96@hotmail.com'))

Array append and prepend

The subdoc-array-prepend and subdoc-array-append operations are true array prepend and append operations. Unlike fulldoc append/prepend operations (which simply concatenate bytes to the existing value), subdoc-array-append and subdoc-array-prepend are JSON-aware:

bucket.mutate_in('customer123', SD.array_append('purchases.complete', 777))
# purchases.complete is now [339, 976, 442, 666, 777]
bucket.mutate_in('customer123', SD.array_prepend('purchases.abandoned', 18))
# purchases.abandoned in now [18, 157, 49, 999]
If your document only needs to contain an array, you do not have to create a top-level object wrapper to contain it. Simply initialize the document with an empty array and then use the empty path for subsequent sub-document array operations:
Creating and populating an array document
bucket.upsert('my_array', [])
bucket.mutate_in('my_array', SD.array_append('', 'some element'))
# the document my_array is now ["some element"]
If you wish to add multiple values to an array, you may do so by passing multiple values to the array-append, array-prepend, or array-insert operations. Be sure to know the difference between passing a collection of multiple elements (in which case the collection is inserted as a single element in the array, as a sub-array) and passing multiple elements (in which case the elements are appended individually to the array):
Add multiple elements to an array
bucket.mutate_in('my_array', SD.array_append('', 'elem1', 'elem2', 'elem3')
# the document my_array is now ["some_element", "elem1", "elem2", "elem3"]
Add single array as element to existing array
bucket.mutate_in('my_array', SD.array_append('', ['elem1', 'elem2', 'elem3'])
# the document my_array is now ["some_element", ["elem1", "elem2", "elem3"]]
Note that passing multiple values to a single array-append operation results in greater performance increase and bandwidth savings than simply specifying a single array-append for each element.
Adding multiple elements to array (slow)
bucket.mutate_in('my_array',
                  SD.array_append('', 'elem1'),
                  SD.array_append('', 'elem2'),
                  SD.array_append('', 'elem3'))
If you wish to create an array if it does not exist and also push elements to it within the same operation you may use the create-parents option:
bucket.mutate_in('some_doc',
                  SD.array_append('some.array', 'Hello', 'World',
                                  create_parents=True))

Arrays as unique sets

Limited support also exists for treating arrays like unique sets, using the subdoc-array-addunique command. This will do a check to determine if the given value exists or not before actually adding the item to the array

bucket.mutate_in('customer123', SD.push_unique('purchases.complete', 95))
# => Success
bucket.mutate_in('customer123', SD.push_unique('purchases.abandoned', 42))
# => SubdocPathExists exception!
Note that currently the addunique will fail with a Path Mismatch error if the array contains JSON floats, objects, or arrays. The addunique operation will also fail with Cannot Insert if the value to be added is one of those types as well.

Note that the actual position of the new element is undefined, and that the array is not ordered.

Array insertion

New elements can also be inserted into an array. While append will place a new item at the end of an array and prepend will place it at the beginning, insert allows an element to be inserted at a specific position. The position is indicated by the last path component, which should be an array index. For example, to insert "cruel" as the second element in the array ["Hello", "world"], the code would look like:
bucket.mutate_in('array', SD.arrayinsert('[1]', 'cruel'))
Note that the array must already exist and that the index must be valid (i.e. it must not point to an element which is out of bounds).

Counters and numeric fields

Counter operations allow the manipulation of a numeric value inside a document. These operations are logically similar to the counter operation on an entire document:
rv = bucket.mutate_in('customer123', SD.counter('logins', 1))
cur_count = rv[0] # => 1
The subdoc-counter operation peforms simple arithmetic against a numeric value, either incrementing or decrementing the existing value.
bucket.upsert('player432', {'gold': 1000})
rv = bucket.mutate_in('player432', SD.counter('gold', -150))
print('player432 now has {0} gold remaining'.format(rv[0]))
# => player 432 now has 850 gold remaining
The existing value for subdoc-counter operations must be within range of a 64 bit signed integer. If the value does not exist, the subdoc-counter operation will create it (and its parents, if create-parents is enabled).
Note that there are several differences between subdoc-counter and the full-document counter operations:
  • Sub-document counters have a range of -9223372036854775807 to 9223372036854775807 (i.e. INT64_MIN and INT64_MAX), whereas full-document counters have a range of 0 to 18446744073709551615 (UINT64_MAX)
  • Sub-document counter operations protect against overflow and underflow, returning an error if the operation would exceed the range. Full-document counters will use normal C semantics for overflow (in which the overflow value is carried over above 0), and will silently fail on underflow, setting the value to 0 instead.
  • Sub-document counter operations can operate on any numeric value within a document, while full-document counter operations require a specially formatted counter document with only the counter value.

Executing multiple operations

Multiple subdocument operations can be executed at once on the same document, allowing you to retrieve or modify several sub-documents at once. When multiple operations are submitted within the context of a single lookup-in or mutate-in command, the server will execute all the operations with the same version of the document.
Note: Unlike batched operations which is simply a way of sending multiple individual operations efficiently on the network, multiple subdoc operations are formed into a single command packet, which is then executed atomically on the server. You can submit up to 16 operations at a time.
When submitting multiple mutation operations within a single mutate-in command, those operations are considered to be part of a single transaction: if any of the mutation operations fail, the server will logically roll-back any other mutation operations performed within the mutate-in, even if those commands would have been successful had another command not failed.

When submitting multiple retrieval operations within a single lookup-in command, the status of each command does not affect any other command. This means that it is possible for some retrieval operations to succeed and some others to fail. While their statuses are independent of each other, you should note that operations submitted within a single lookup-in are all executed against the same version of the document.

Note: In the case of concurrent sub-document API operations from multiple clients, Couchbase Server does not honor the atomicity ( MB-21597). This can result in one update overwriting the previous update, even though the previous update was successful as far as the client is concerned. Workaround: For use cases where sub-document API is being used and are covered by an Enterprise Edition license, contact Couchbase Support. For all other cases, Couchbase recommends that you do not use the sub-document API feature for concurrent updates.

Creating parents

Sub-document mutation operations such as subdoc-upsert or subdoc-insert will fail if the immediate parent is not present in the document. Consider:
{
    "level_0": {
        "level_1": {
            "level_2": {
                "level_3": {
                    "some_field": "some_value"
                }
            }
        }
    }
}
Looking at the some_field field (which is really level_0.level_1.level_2.level_3.some_field), its immediate parent is level_3. If we were to attempt to insert another field, level_0.level_1.level_2.level_3.another_field, it would succeed because the immediate parent is present. However if we were to attempt to subdoc-insert to level_1.level_2.foo.bar it would fail, because level_1.level_2.foo (which would be the immediate parent) does not exist. Attempting to perform such an operation would result in a Path Not Found error.
By default the automatic creation of parents is disabled, as a simple typo in application code can result in a rather confusing document structure. Sometimes it is necessary to have the server create the hierarchy however. In this case, the create-parents or create-intermediates option may be used.
bucket.mutate_in('customer123',
                 SD.upsert('level_0.level_1.foo.bar.phone',
                           {'num': '775-867-5309', 'ext': 16},
                           create_parents=True))

CAS Semantics

Subdoc mostly eliminates the need for tracking the CAS value. Subdoc operations are atomic and therefore if two different threads access two different sub-documents then no conflict will arise. For example the following two blocks can execute concurrently without any risk of conflict
bucket.mutate_in('customer123', SD.array_append('purchases.complete', 999))
bucket.mutate_in('customer123', SD.array_append('purchases.abandoned', 998))
Even when modifying the same part of the document, operations will not necessarily conflict, for example two concurrent subdoc-array-append to the same array will both succeed, never overwriting the other.

While CAS is no longer required to ensure document updates are preserved, it may still be needed to ensure document state remains consistent over multiple invocations of mutate-in: Sometimes it's important to ensure the entire document didn't change state since the last operation, such as in the case subdoc-remove operations to ensure that the element being removed was not already replaced by something else.

Error handling

Subdoc operations have their own set of errors. When programming with subdoc, be prepared for any of the full-document errors (such as Document Not Found) as well as special sub-document errors which are received when certain constraints are not satisfied. Some of the errors include:
  • Path does not exist: When retrieving a path, this means the path does not exist in the document. When inserting or upserting a path, this means the immediate parent does not exist.
  • Path already exists: In the context of an insert, it means the given path already exists. In the context of array-add-unique, it means the given value already exists.
  • Path mismatch: This means the path may exist in the document, but that there is a type conflict between the path in the document and the path in the command. Consider the document,
    { "tags": ["reno", "nevada", "west", "sierra"] }
    The path tags.sierra is a mismatch, since tags is actually an array, while the path assumes it is a JSON object (dictionary).
  • Document not JSON: This means you are attempting to modify a binary document using sub-document operations.
  • Invalid path: This means the path is invalid for the command. Certain commands such as subdoc-array-insert expect array elements as their final component, while others such as subdoc-upsert and subdoc-insert expect dictionary (object) keys.
Because subdocument operations are executed using either mutate-in or replace-in, if a command fails a top-level error is reported ( Multi Command Failure), rather than an individual error code (e.g. Path Not Found). When receiving a top-level error code, you should traverse the results of the command to see which individual code failed.

Path syntax

Path syntax largely follows N1QL conventions: A path is divided into components, with each component referencing a specific level in a document hierarchy. Components are separated by dots (.) in the case where the element left of the dot is a dictionary, or by brackets ([n]) where the element left of the bracket is an array and n is the index within the array.

As a special extension, you can indicate the last element of an array by using an index of -1, for example to get the last element of the array in the document
{"some":{"array":[1,2,3,4,5,6,7,8,9,0]}}
Use some.array[-1] as the path, which will return the element 0.
Each path component must conform as a JSON string, as if it were surrounded by quotes, and any character in the path which may invalidate it as a JSON string must be escaped by a backslash ( \). In other words, the path component must match exactly the path inside the document itself. For example:
{"literal\"quote": {"array": []}}
must be referenced as literal\"quote.array.
If the path also has special path characters (i.e. a dot or brackets) it may be escaped using N1QL escapes. Considering the document
{"literal[]bracket": {"literal.dot": true}}
A path such as `literal[]bracket`.`literal.dot`. You can use double-backticks ( ``) to reference a literal backtick.

If you need to combine both JSON and path-syntax literals you can do so by escaping the component from any JSON string characters (e.g. a quote or backslash) and then encapsulating it in backticks (`path`). Here is such an example in Python:

import json
def escape_component(component):
    component = json.dumps(component)[1:-1]
    return '`' + component + '`'

print escape_component("Hello!") # `Hello!`
print escape_component("backtick[]") # `backtick[]`
print escape_component("[\"mixed\\") # `[\"mixed\\`
Note: Currently, paths cannot exceed 1024 characters, and cannot be more than 32 levels deep.