Concurrent Document Mutations
Edit this article in GitHub
Version 2.4

Concurrent Document Mutations

You can use the CAS value to control how concurrent document modifications are handled. It helps avoid and control potential race conditions in which some mutations may be inadvertently lost or overridden by other mutations.

The CAS is a value representing the current state of an item. Each time the item is modified, its CAS changes.

The CAS value itself is returned as part of a document’s metadata whenever a document is accessed. In the SDK this is present as the cas field in the applicable document container object when an operation is successful.

CAS is an acronym for Compare And Swap, and is known as a form of optimistic locking. The CAS can be supplied as parameters to the insert, upsert, replace, and remove operations. When applications provide the CAS, server will check the application-provided version of CAS against its own version of the CAS:
  • If the two CAS values match (they compare successfully), then the mutation operation succeeds.
  • If the two CAS values differ, then the mutation operation fails.
CAS, on the server-side might be implemented along these lines
uint Replace(string docid, object newvalue, uint oldCas=0) {
    object existing = this.kvStore.get(docid);
    if (!existing) {
        throw DocumentDoesNotExist();
    } else if (oldCas != 0 && oldCas != existing.cas) {
        throw CasMismatch();
    }
    uint newCas = ++existing.cas;
    existing.value = newValue;
    return newCas;
}

Demonstration

The following demonstrates how the server handles CAS. A use case for employing the CAS is when adding a new field to an existing document. At the application level, this requires the following steps:
  1. Read entire document.
  2. Perform modification locally.
  3. Store new document to server.
Assume the following two blocks of code are executing concurrently in different application instances:
Table 1. CAS flow
Thread #1 Thread #2
>>> result = cb1.get('docid')
>>> new_doc = result.value
>>> new_doc['field1'] = 'value1'
>>> cb1.replace('docid', new_doc)
>>> result = cb2.get('docid')
>>> new_doc = result.value
>>> new_doc['field2'] = 'value2'
>>> cb2.replace('docid', new_doc)
Retrieving the document again yields:
>>> cb1.get('docid').value
{u'field2': u'value2', u'a_field': u'a_value'}

Note that field1 is not present, even though the application inserted it into the document. The reason is because the replace on Thread #2 happened to run after the replace on Thread #1, however Thread #1’s replace was executed after Thread #2’s get: Since the local version of the document on Thread #2 did not contain field1 (because Thread #1’s update was not stored on the server yet), by executing the replace, it essentially overrode the replace performed by Thread #1.

0.00ms (#2): new_doc = get("docid").value
0.00ms (#1): new_doc = get("docid").value
0.00ms (#1): new_doc["field1"] = "value1"
0.00ms (#2): new_doc["field2"] = "value2"
0.01ms (#1): cb.replace("docid", new_doc)
0.02ms (#2): cb.replace("docid", new_doc)

Using CAS - Example

In the prior example, we saw that concurrent updates to the same document may result in some updates being lost. This is not because Couchbase itself has lost the updates, but because the application was unaware of newer changes made to the document and inadvertently override them.
Table 2. CAS flow
   
>>> result = cb1.get('docid')
>>> new_doc = result.value
>>> print new_doc
{u'a_field': u'a_value'}
>>> cur_cas = result.cas
>>> print cur_cas
272002471883283
>>> new_doc['field1'] = 'value1'
>>> new_result = cb1.replace(
       'docid',
       new_doc,
       cas=cur_cas)
Note: Server's CAS matches cur_cas. New CAS assigned
>>> print new_result.cas
195896137937427
>>> result = cb2.get('docid')
>>> new_doc = result.value
>>> print new_doc
{u'a_field': u'a_value'}
>>> cur_cas = result.cas
>>> print cur_cas
272002471883283
>>> new_doc['field2'] = 'value2'
>>> new_result = cb2.replace(
       'docid',
       new_doc,
       cas=cur_cas)
Warning: CAS on server differs: 195896137937427 vs 272002471883283!

Handling CAS errors

If the item’s CAS has changed since the last operation performed by the current client (i.e. the document has been changed by another client), the CAS used by the application is considered stale. If a stale CAS is sent to the server (via one of the mutation commands, as above), the server will reply with an error, and the Couchbase SDK will accordingly return this error to the application (either via return code or exception, depending on the language).
Note: The error returned for a CAS mismatch is the same that is returned when trying to insert an already-existing document. The error code is not ambiguous since the CAS option is only accepted (and only makes sense) for documents which already exist.
How to handle this error depends on the application logic. If the application wishes to simply insert a new property within the document (which is not dependent on other properties within the document), then it may simply retry the read-update cycle by retrieving the item (and thus getting the new CAS), performing the local modification and then uploading the change to the server. For example, if a document represents a user, and the application is simply updating a user’s information (like an email field), the method to update this information may look like this:
def update_email(cb, user_id, email):
    while True:
        result = cb.get(user_id)
        user_doc = result.value
        user_doc['email'] = email
        cb.upsert(user_id, user_doc, cas=result.cas)
        break # Mutation succeeded
    except KeyExistsError:
        # This means the CAS has been modified. We still need to
        # insert the email field, but do not want to override new changes
        continue

Sometimes more logic is needed when performing updates, for example, if a property is mutually exclusive with another property; only one or the other can exist, but not both.

Performance considerations

CAS operations incur no additional overhead. CAS values are always returned from the server for each operation. Comparing CAS at the server involves a simple integer comparison which incurs no overhead.

CAS value format

The CAS value should be treated as an opaque object at the application level. No assumptions should be made with respect to how the value is changed (for example, it is wrong to assume that it is a simple counter value). In the SDK, the CAS is represented as a 64 bit integer for efficient copying but should otherwise be treated as an opaque 8 byte buffer.

Pessimistic locking

While CAS is the recommended way to perform locking and concurrency control, Couchbase also offers explicit locking. When a document is locked, attempts to mutate it without supplying the correct CAS will fail.

Documents can be locked using the get-and-lock operation and unlocked either explicitly using the unlock operation or implicitly by mutating the document with a valid CAS. While a document is locked, it may be retrieved but not modified without using the correct CAS value. When a locked document is retrieved, the server will return -1 (or 0xffffffffffffffff) as the CAS value.

This handy table shows various behaviors while an item is locked:
Table 3. Behavior of various operations on a locked item
Operation Result
get-and-lock Temporary Failure Error.
get Always succeeds, but with an invalid CAS value returned (so it cannot be used as an input to subsequent mutations).
unlock with bad/missing CAS value Temporary Failure Error
unlock with correct CAS Mutation is performed and item is unlocked. It can now be locked again and/or accessed as usual.
Mutate (upsert, replace, etc) with bad/missing CAS value KeyExists/BadCAS Error
Mutate with correct CAS value Mutation is performed and item is unlocked. It can now be locked again and/or accessed as usual.

A document can be locked for a maximum of 15 seconds, after which the server will unlock it. This is to prevent misbehaving applications from blocking access to documents inadvertently. You can modify the time the lock is held for (though it can be no longer than 15 seconds).

Be sure to keep note of the cas value when locking a document. You will need it when unlocking. The following blocks show how to use lock and unlock with the C SDK. This process should be similar for other SDKs.
lcb_CMDGET cmd = { 0 };
LCB_CMD_SET_KEY(&cmd, "key", 3);
cmd.lock = 1;
cmd.exptime = 5;
lcb_get3(instance, NULL, &cmd);
lcb_wait(instance);
static int unlock_only = 0;

void get_handler(lcb_t instance, int cbtype, const lcb_RESPBASE *rb)
{
    const lcb_RESPGET *resp = (const lcb_RESPGET *)rb;

    union {
        lcb_CMDBASE base;
        lcb_CMDSTORE store;
        lcb_CMDUNLOCK unlock;
    } u;

    if (rb->rc == LCB_ETMPFAIL) {
        printf("Key already locked!");
    } else if (rb->rc == LCB_SUCCESS) {
        memset(&u, 0, sizeof u);
        LCB_CMD_SET_KEY(&u.base, rb->key, rb->nkey);
        u.base.cas = rb->cas;
        if (unlock_only) {
            lcb_unlock3(instance, NULL, &u.unlock);
        } else {
            LCB_CMD_SET_VALUE(&u.store, "value", 5);
            u.store.operation = LCB_REPLACE;
            lcb_store3(instance, NULL, &u.store);
        }
    }
}
The handler will unlock the item either via an explicit unlock operation ( lcb_unlock3) or implicitly via modifying the item with the correct CAS.

If the item has already been locked, the server will respond with LCB_ETMPFAIL which means that the operation could not be executed temporarily, but may succeed later on.