Core Operations
Edit this article in GitHub
Version 2.2

Core Operations

At the core of a database is its ability to store and retrieve data. This section discusses the various ways in which you can access data (Documents) in Couchbase: creating, updating, retrieving, querying, and deleting documents.

Document

A document refers to an entry in the database (other databases may refer to the same concept as a row). A document has an ID (primary key in other databases), which is unique to the document and by which it can be located. The document also has a value which contains the actual application data.

Document IDs are assigned by application. A valid document ID must:
  • Conform to UTF-8 encoding
  • Be no longer than 250 bytes
    Note: There is a difference between bytes and characters: most non-Latin characters occupy more than a single byte.
You are free to choose any ID for your document, so long as they conform to the above restrictions. Unlike some other database, Couchbase does not automatically generate IDs for you, though you may use a separate counter to increment a serial number.
The document value contains the actual application data; for example, a product document may contain information about the price and description. Documents are usually ( but not always) stored as JSON on the server. Because JSON is a structured format, it can be subsequently searched and queried.
{
    "type": "product",
    "sku": "CBSRV45DP",
    "msrp": [5.49, "USD"],
    "ctime": "092011",
    "mfg": "couchbase",
    "tags": ["server", "database", "couchbase", "nosql", "fast", "json", "awesome"]
}

Primitive Key-Value Operations

upsert(docid, document)
insert(docid, document)
replace(docid, document)
get(docid)
remove(docid)

In Couchbase documents are stored using one of the operations: upsert, insert, and replace. These operations will all write a JSON document with a given document ID (key) to the database. The update methods differ in behavior in respect to the existing state of the document:

  • insert will only create the document if the given ID is not found within the database.
  • replace will only replace the document if the given ID already exists within the database.
  • upsert will always replace the document, ignoring whether the ID has already existed or not.

Documents can be retrieved using the get operation, and finally removed using the remove operations.

Since Couchbase’s KV store may be thought of as a distributed hashmap or dictionary, the following code samples are explanatory of Couchbase’ update operations in pseudo-code:

map<string,object> KV_STORE;

void insert(string doc_id, object value) {
    if (!KV_STORE.contains(doc_id)) {
        KV_STORE.put(doc_id, value);
    } else {
        throw DocumentAlreadyExists();
    }
}

void replace(string doc_id, object value) {
    if (KV_STORE.contains(doc_id)) {
        KV_STORE.put(doc_id, value);
    } else {
        throw DocumentNotFound();
    }
}

void upsert(string doc_id, object value) {
    KV_STORE.put(doc_id, value);
}

object get(string doc_id) {
    if (KV_STORE.contains(doc_id)) {
        return KV_STORE.get(doc_id);
    } else {
        throw DocumentNotFound();
    }
}
You can also use N1QL queries and Full Text Search to access documents by means other than their IDs, however these query operations Couchbase eventually translate into primitive key-value operations, and exist as separate services outside the data store.

Storing and Updating Documents

Documents can be stored and updated using either the SDK, Command line, or Web UI. When using a storage operation, the full content of the document is replaced with a new value.

The following example shows a document being stored using the cbc utility. The ID of the document is docid and its value is JSON containing a single field (json) with the value of value.

# When storing JSON data using cbc, ensure it is properly quoted for your shell:
$ cbc create docid -V '{"json":"value"}' -M upsert -U couchbase://cluster-node/bucket-name
docid               Stored. CAS=0x8234c3c0f213
You can also specify additional options when storing a document in Couchbase
  • Expiry (or TTL) value which will instruct the server to delete the document after a given amount of time. This option is useful for transient data (such as sessions). By default documents do not expire. See Expiry for more information on expiration.
  • CAS value to protect against concurrent updates to the same document. SeeCAS for a description on how to use CAS values in your application.
  • Durability Requirements
Note: If you wish to only modify certain parts of a document, you can use sub-document operations which operate on specific subsets of documents:
cb.mutate_in('docid', subdoc.array_addunique('tags', 'fast'))
or N1QL UPDATE to update documents based on specific query criteria:
update `default` SET sale_price = msrp * 0.75 WHERE msrp < 19.95;

Retrieving Documents

Fastpath: This section discusses retrieving documents using their IDs, or primary keys. Documents can also be accessed using secondary lookups via N1QL queries and MapReduce. Primary key lookups are performed using the key-value API, which simplifies use and increases performance (as applications may interact with the KV store directly, rather than a secondary index or query processor).
In Couchbase, documents are stored with their IDs. Retrieving a document via its ID is the simplest and quickest operation in Couchbase.
>>> result = cb.get('docid')
>>> print result.value
{'json': 'value'}
$ cbc cat docid
docid                CAS=0x8234c3c0f213, Flags=0x0. Size=16
{"json":"value"}

Once a document is retrieved, it is accessible in the native format by which it was stored; meaning that if you stored the document as a list, it is now available as a list again. The SDK will automatically deserialize the document from its stored format (usually JSON) to a native language type. It is possible to store and retrieve non-JSON documents as well, using a transcoder.

You can also modify a document's expiration time while retrieving it; this is known as get-and-touch and allows you to keep temporary data alive while retrieving it in one atomic and efficient operation.

Documents can also be retrieved with N1QL. While N1QL is generally used for secondary queries, it can also be used to retrieve documents by their primary keys (ID) (though it is recommended to use the key-value API if the ID is known). Lookups may be done either by comparing the META(from-term).id or by using the USE KEYS [...] keyword:
SELECT * FROM default USE KEYS ["docid"];
or
SELECT * FROM default WHERE META(default).id = "docid";
You can also retrieve parts of documents using sub-document operations, by specifying one or more sections of the document to be retrieved
name, email = cb.retrieve_in('user:kingarthur', 'contact.name', 'contact.email')

Counters

You can atomically increment or decrement the numerical value of special counter document

A document may be used as a counter if its value is a simple ASCII number, like 42. Couchbase allows you to increment and decrement these values atomically using a special counter operation. The example below (in Python) shows how to use counters:
>>> cb.counter('counter_id', delta=20, initial=100).value
100L
>>> cb.counter('counter_id', delta=1).value
101L
>>> cb.counter('counter_id', delta=-50).value
51L
In the above example, a counter is created by using the counter Python method with an initial value. The initial value is the value the counter uses if the counter ID does not yet exist.

Once created, the counter can be incremented or decremented atomically by a given amount or delta. Specifying a positive delta increments the value and specifying a negative one decrements it. When a counter operation is complete, the application receives the current value of the counter, after the increment.

Couchbase counters are 64-bit unsigned integers in Couchbase and do not wrap around if decremented beyond 0. However, counters will wrap around if incremented past their maximum value (which is the maximum value contained within a 64-bit integer). Many SDKs will limit the delta argument to the value of a signed 64-bit integer.

Expiration times can also be specified when using counter operations.

CAS values are not used with counter operations since counter operations are atomic. The intent of the counter operation is to simply increment the current server-side value of the document. If you wish to only increment the document if it is at a certain value, then you may use a normal upsert function with CAS:
rv = cb.get('counter_id')
value, cas = rv.value, rv.cas
if should_increment_value(value):
  cb.upsert('counter_id', value + increment_amount, cas=cas)

You can also use sub-document counter operations to increment numeric values within a document containing other content.

Raw Byte Concatenation

Warning: The following methods should not be used with JSON documents.

The append and prepend operations operate at the byte level and are unsuitable for dealing with JSON documents. Use these methods only when explicitly dealing with binary or UTF-8 documents. Using the append and prepend methods may invalidate an existing JSON document. You can use sub-document operations if you want to have true JSON-aware prepend and append operations which add values to JSON arrays.

append(docid, fragment)
prepend(docid, fragment)

The append and prepend operations atomically add bytes to the end or beginning of a binary document. They are an efficient alternative to retrieving a binary document in its entirety, appending the contents locally, and then saving the contents back to the server.

Because these methods do raw string manipulation, they are only suitable for non-JSON documents: Prepending or appending anything to a JSON document will invalidate the JSON and make it unparseable by standard JSON parsers.

The semantics of the append and prepend operations are similar to those of the upsert family of operations, except that they accept the fragment to append as their value, rather than the entire document. These functions may be used to add efficiency for custom binary data structures (such as logs), as they avoid transferring the contents of the entire document for each operation. Consider the following versions (which are equivalent)

Append using get() and replace() (slow)
# Store the document
cb.upsert('binary_doc', '\x01', format=couchbase.FMT_BYTES)

while True:
    # Retrieve the entire document
    rv = cb.get('binary_doc')
    value = rv.value + '\x02'
    try:
        # Upload the entire document
        cb.replace('binary_doc', value, format=couchbase.FMT_BYTES)
        break
    except couchbase.exceptions.KeyExistsError:
        continue

print repr(cb.get('binary_doc').value)
Append using append() (fast)
# Store the document
cb.upsert('binary_doc', '\x01', format=couchbase.FMT_BYTES)

# Append a fragment
cb.append('binary_doc', '\x02', format=couchbase.FMT_BYTES)

print repr(cb.get('binary_doc').value)
Note that since the append operation is done atomically, there is no need for a CAS check (though one can still be supplied if the document must be at a specific state).
Users of the append and prepend operations should ensure that the resulting documents do not become too large. Couchbase has a hard document size limit of 20MB.

Using append and prepend on larger documents may cause performance degradation and memory fragmentation at the server level, as for each append operation the server must allocate memory for the new document size and then append the fragment to the new memory. The performance impact may be significant when document sizes reach beyond 100KB.

Finally, note that while append saves network traffic from the client to server (by only specifying the fragment to append), the entire document is replicated for each mutation. Five append operations on a single 10MB document will result in 50MB of traffic to each replica.

Expiration Overview

Most data in a database is there to be persisted and long-lived. However, the need for transient or temporary data does arise in applications, such as in the case of user sessions, caches, or temporary documents representing a given process ownership. You can use expiration values on documents to handle transient data.

In databases without a built-in expiration feature, dealing with transient data may be cumbersome. To provide "expiration" semantics, applications are forced to record a time stamp in a record, and then upon each access of the record check the time stamp and, if invalid, delete it.

Since some logically ‘expired’ documents might never be accessed by the application, to ensure that temporary records do not persist and occupy storage, a scheduled process is typically also employed to scan the database for expired entries routinely, and to purge those entries that are no longer valid.

Workarounds such as those described above are not required for Couchbase, as it allows applications to declare the lifetime of a given document, eliminating the need to embed "validity" information in documents and eliminating the need for a routine "purge" of logically expired data.

When an application attempts to access a document which has already expired, the server will indicate to the client that the item is not found. The server internally handles the process of determining the validity of the document and removing older, expired documents.

Setting Document Expiration

By default, Couchbase documents do not expire. However, the expiration value may be set for the upsert, replace, and insert operations when modifying data.

Couchbase offers two additional operations for setting the document's expiration without modifying its contents:

  • The get-and-touch operation allows an application to retrieve a document while modifying its expiration time. This method is useful when reading session data from the database: since accessing the data is indicative of it still being "alive", get-and-touch provides a natural way to extend its lifetime.

  • The touch operation allows an application to modify a document’s expiration time without otherwise accessing the document. This method is useful when an application is handling a user session but does not need to access the database (for example, if a particular document is already cached locally).

For Couchbase SDKs which accept simple integer expiry values (as opposed to a proper date or time object) allow expiration to be specified in two flavors.
  1. As an offset from the current time.
  2. As an absolute Unix time stamp
If the absolute value of the expiry is less than 30 days (such as 60 * 60 * 24 * 30), it is considered an offset. If the value is greater, it is considered an absolute time stamp.

It might be preferable for applications to normalize the expiration value, such as by always converting it to an absolute time stamp. The conversion is performed to avoid issues when the intended offset is larger than 30 days, in which case it is taken to mean a Unix time stamp and, as a result, the document will expire automatically as soon as it is stored.

Remember:
  • If you wish to use the expiry feature, then you should supply the expiry value for every mutation operation.
  • When dealing with expiration, it is important to note that most operations will implicitly remove any existing expiration. Thus, when modifying a document with expiration, it is important to pass the desired expiration time.
  • A document is expired as soon as the current time on the Couchbase Server node responsible for the document exceeds the expiration value. Bear this in mind in situations where the time on your application servers differs from the time on your Couchbase Server nodes.

Note that expired documents are not deleted from the server as soon as they expire. While a request to the server for an expired document will receive a response indicating the document does not exist, expired documents are actually deleted (i.e. cease to occupy storage and RAM) when an expiry pager is run. The expiry pager is a routine internal process which scans the database for items which have expired and promptly removes them from storage.

When gathering resource usage statistics, note that expired-but-not-purged items (such as the expiry pager has not scanned this item yet) will still be considered with respect to the overall storage size and item count.