Sub-Document API

Sub-Document API

The sub-document API enables you to access parts of JSON documents (sub-documents) efficiently without requiring the transfer of the entire document over the network. This improves performance and brings better efficiency to the network IO path, especially when working with large JSON documents.

The key-value APIs in Couchbase operate on entire documents. In use cases where small documents or binary values are used, operations that retrieve and update the entire document are acceptable and desirable. However, in use cases where large documents are used, retrieving an entire document to read or update a single field isn't practical. Modifying a single field involves retrieving the entire document over the network, modifying the field locally, and then passing the modified document back over the network to save it in the database. Key-value APIs are well suited for binary values and small JSON documents.
Note: The key-value APIs can also operate on binary formats which are not supported by sub-document APIs. Append operations on binary values are always atomic and do not retrieve the document to perform the append.
With the addition of the sub-document API, you can now access and operate on individual JSON fields, sub-document fragments, within a larger JSON document. Consider the following example which uses a sub-document API to retrieve just the last name from a user profile JSON document.

Only the requested or modified fields are sent over the network as opposed to the entire document being sent over the network when using key-value APIs.
Note: The above example shows the underlying Memcache protocol operations. The sub-document APIs are exposed through convenient builder APIs in each of the SDKs. All sub-document operations are atomic at the document level.
Atomically modifying fields within a JSON document is typically suited to the following scenarios:
  • An application does not have the existing document available locally and wishes to make a predetermined change to a specific field as part of a routine operation. For example, incrementing a statistics counter or a login counter.
  • An application already has the existing document available locally, but wishes to use an atomic operation for modifying it, to save bandwidth and be more efficient. For example, an existing web session where the user modifies or stores some data such as an updated profile or an updated score.
  • Cross-referencing scenarios, where an application-defined relationship exists between two documents. In the context of social gaming, this may be thought of as sending messages between inboxes.
    1. User #1 sends a message to User #2.
    2. This may be implemented as: generate a key for the inbox message, store it somewhere.
    3. docAddValue(‘user:1’, ‘sent’, [‘user:2’, ‘keyToMessage’]
    4. docAddValue(‘user:2’, ‘inbox’, [‘user:1’, ‘keyToMessage’]
Note: The following blogs explain how the sub-document API is expressed using different SDKs:
Consider a simple Java example that uses the sub-document API to connect to the travel-sample bucket, fetch the name field from the document " airline_13633", and then print it.
Fetch.java

// Fetch and print the name from an airline
DocumentFragment<Lookup> resultLookup = bucket.lookupIn("airline_13633").get("name").doLookup();
LOGGER.info(resultLookup.content("name", String.class));

The API for sub-document operations use the dot notation syntax to identify the logical location of an attribute within a document. This is also consistent with N1QL's path syntax to refer to individual fields in a document. In the example below, the path to the last name field is "name.last".

Updates to a field are atomic and do not collide with updates to a different field on the same key. For example, the following operations do not collide although they are updating the same document.
[Thread 1]
        bucket.mutateIn("user").upsert("name.last","Lennon",false).doMutate();
[Thread 2]
        bucket.mutateIn("user").upsert("email","jlennon@abc.com",false).doMutate();

Commands

This section lists the available sub-document commands. There are two categories of commands exposed through builder APIs in the SDKs:
  • lookupIn commands which are used to read data from existing documents.
  • mutateIn commands which are used to modify documents.

Sub-document commands are named similar to their full-document counterparts, but they perform the logical key-value operation within a single document rather than operating on the entire document. In addition to retrieving and setting fields, the sub document API allows true "append" and "prepend" operations on arrays, as well as increment and decrement operations on numeric values.

Lookup Commands

There are two sub-document lookup commands - get and exists.

get returns a specific path from a single document. It can be used to return any JSON primitive, assuming a suitable path is constructed. For example, consider the following document from the travel-sample dataset:
{
  "id": 55136,
  "type": "route",
  "airline": "U2",
  "airlineid": "airline_2297",
  "sourceairport": "MAN",
  "destinationairport": "AMS",
  "stops": 0,
  "equipment": ["320", "319"],
  "active": true,
  "schedule": [
    {
      "day": 0, "utc": "17:37:00", "flight": "U2219"
    },
    {
      "day": 1, "utc": "07:58:00", "flight": "U2839"
    }
  ]
}
Using the sub-document get command the following fields of varying types can be returned via these paths:
"id"  - 55136
(number)
"active"  - true
(boolean)
"schedule[0]" - { "day": 0, "utc": "17:37:00", "flight": "U2219"}
(dictionary)
"equipment" - ["320", "319"]
(array)

The exists command is similar to get, except that it only checks for the existence of a given path, and does not return the document fragment itself. This command can be used to check if a particular path exists in a document, without having to actually receive the fragment.

Mutation Commands

The sub-document API supports the addition of new fields, and modification or deletion of existing fields in a JSON document. Different commands are used depending on the type of the field being mutated.

Mutating Dictionary Fields

The sub-document API supports four commands on JSON dictionaries (also known as objects):
  • Creating of a new name/value pair using insert.
  • Replacing an existing name/value pair using replace.
  • Creating a new name/value pair or replacement of an existing one using upsert.
  • Deleting an existing name/value using remove.
The MutateDict.java example below shows the use of replace to update the callsign field in for a particular airline document (which is composed of a top-level dictionary):
Mutate.java

// Update CallSign for "Pan Am" to "Clipper"
DocumentFragment<Mutation> resultMutation = bucket.mutateIn("airline_13633").upsert("callsign","CLIPPER",false).execute();

// Fetch and print the callsign from an airline
resultLookup = bucket.lookupIn("airline_13633").get("callsign").execute();
LOGGER.info(resultLookup.content("callsign", String.class));

Mutating Array Fields

The sub-document API supports a similar set of commands on arrays as on dictionaries. It also adds the ability to push items to the beginning or the end of an array, without having to explicitly check the current length of the array.
  • Adding a new element to an array at a specific index using arrayInsert.
  • Pushing a new element to the start or the end of an array using pushFront or pushBack.
  • Replacing an existing index with a new value using replace.
  • Deleting an existing array element (reducing the array size by 1) using remove.
  • Adding a new element only if the value is not already present in the array using addUnique.
The ArraysAndDicts.java example below shows the use of upsert to create a new " fleet" array in an existing document, and then appends two new aircraft dictionaries (containing the aircraft name and engine count) to the " fleet" array:
ArraysAndDicts.java

// Creates a "fleet" array and pushes aircraft into it
bucket.mutateIn("airline_13633").upsert("fleet", JsonArray.from(
    JsonObject.create().put("name", "747-200B").put("heavy",true).put("engines",4),
    JsonObject.create().put("name", "737-200").put("engines",2)
), false).execute();

The sub-document API also supports enforcing that values are unique in an array, which allows the construction of mathematical sets.

The Unique.java example below shows an example of mathematical sets - each airline has a models array recording what models of aircraft an airline operates. There is a constraint that the elements in models should be unique (a model shouldn’t appear more than once), so the addUnique command is used when adding new models:
Unique.java

// Creates a "models" array and adds UNIQUE values into it
bucket.mutateIn("airline_13633").upsert("models",JsonArray.empty(),false).execute();
bucket.mutateIn("airline_13633").addUnique("models","747-200B",false).addUnique("models","747-120",false).execute();

// This will fail!  The Array already contains the 747-120
try {
        bucket.mutateIn("airline_13633").addUnique("models", "747-120", false).execute();
    }
catch (PathExistsException ex){
              LOGGER.info("Whoops!  Model is already part of the models array.");
}   

Arithmetic commands

The sub-document API allows basic arithmetic operations (addition and subtraction) to be performed on integer fields in a document using the counter command.

This allows simple counters to be implemented server-side, without the client application having to explicitly fetch the field, update the numeric value and then replace it back again. It also prevents the possibility of another client attempting to perform the update at the same time and the increment or decrement being lost.

Arithmetic operations can only be performed on integer numeric values which can be represented as a signed 64 bit value (i.e. the C type int64_t) and the delta being added or subtracted also needs to be an int64_t.

The Counter.java example below demonstrates the use of counter to increment two fields - passengers.served and passengers.complained:
Counter.java

// Increment passenger_served counter on the airline
bucket.mutateIn("airline_13633").counter("passengers.served",1L,true).execute();

// Simulate some randomness that a passenger complained while being served
if (new Random().nextInt() % 2 == 0) {
    bucket.mutateIn("airline_13633").counter("passengers.complained",1L,true).execute();
    }
}

Maintaining Data Consistency

When using key-value APIs, updates to a single field requires CAS to maintain consistency. In case of highly contended documents, if a CAS mismatch occurs the operation needs to be restarted even though the modified field remains the same. Sub-document APIs do not require the use of CAS when updating single fields. However, you can still use the CAS protection for the document if your application requires it. For more information on CAS, see Concurrent Document Mutations.

The application logic may require a document modification to be either:
  • Locally consistent with regards to the immediate parent object which contains the value being modified. For example, ensure that a specific object key is unique, or ensure that a specific list item is not duplicated.
  • Globally consistent with regards to the entire document. For example, if the existence of one field in the document only makes sense when another field is in a specific state.
In versions prior to Couchbase Server 4.5, both of these scenarios require the application to make use of CAS to ensure consistency. With the sub-document API model, the local consistency requirement does not require CAS as the server can ensure that the data is consistent atomically. For global consistency requirements, use CAS through the SDKs to ensure that a document's state has not already changed.

Multi-path Operations

As demonstrated in the examples above, the sub-document API supports operating on multiple paths in a single key with potentially different commands. The builder APIs allow commands to be chained together for efficiency. Multi-path operations can retrieve multiple disjoint fields from a single key atomically. Multi-path operations can also modify multiple disjoint fields from a single key atomically.

Important: A multi-path operation through either the lookupIn or mutateIn builder APIs can only perform a retrieval or a mutation, not both.

Sub-Document API Suitability

The sub-document API is a trade-off in server resource usage, between CPU and network bandwidth. When using a sub-document command the client only transmits the key, path and fragment to change, as opposed to sending the key, and complete (whole) value. Depending on the size of the document being operated on and the size of the fragment, this can result in a significant saving of network bandwidth. For example, operating on a 100KB document named " user::j.bloggs" where a 30 byte fragment is added to a path of length 20 bytes would require sending the following over the network:
  Size (bytes)
  Header Key Path Value Total
Full document

(SET)

24 14 - 100,240 100,278
Sub-document

(SUBDOC_DICT_ADD)

24 14 20 30 88
In this example, there is a saving of 100,190 bytes using sub-document compared to existing full document operations, or a 99.91% saving in network bandwidth.
However, this bandwidth saving is only possible because the cluster node performs the additional processing to handle this request. The cluster node needs to parse the current JSON value for " user::j.bloggs", apply the requested modification (inserting an element into a dictionary in the above example), and then store the result. The exact CPU required for this will vary considerably depending on a number of factors, including:
  • Size of the existing document.
  • Complexity (different levels of nesting, etc) of the existing document.
  • Type of sub-document operation being performed.
  • Size of the fragment being applied.
In general, sub-document API is a good fit for applications where network bandwidth is at a premium, and at least one of the following is true:
  • The document being operated on is not very small.
  • The fragment being requested/modified is a small fraction of the total document size.

Limits

There are several sanity-check-like limits when using the sub-document API. These limits are essentially arbitrary but are there to improve performance, conserve memory, and help detect errant code.

  • Paths cannot have more than 32 levels of nesting (e.g. foo is one layer, foo.bar is two layers, and foo.bar[4] is three layers).
  • Paths cannot be longer than 1024 bytes.
  • Documents containing more than 32 levels of nesting cannot be parsed.
  • You may not combine more than 16 operations within a lookup-in or mutate-in command.