Document Operations
Edit this article in GitHub
Version 2.3

Document Operations

You can access documents in Couchbase using methods of the couchbase.couchbase.client.java.Bucket object.

The methods for retrieving documents are get() and lookupIn() and the methods for mutating documents are upsert(), insert(), replace() and mutateIn().

Examples are shown using the synchronous API. See the section on Async Programming for other APIs.

Additional Options

Update operations also accept a TTL (expiry) value (expiry) on the passed document which will instruct the server to delete the document after a given amount of time. This option is useful for transient data (such as sessions). By default documents do not expire. See Expiration Overview for more information on expiration.

Update operations can also accept a CAS (cas) value on the passed document to protect against concurrent updates to the same document. See CAS for a description on how to use CAS values in your application. Since CAS values are opaque, they are normally retreived when a Document is loaded from Couchbase and then used subsequently (without modification) on the mutation operations. If a mutation did succeed, the returned Document will contain the new CAS value.

Document Input and Output Types

Couchbase stores documents. From an SDK point of view, those documents contain the actual value (like a JSON object) and associated metadata. Every document in the Java SDK contains the following properties, some of them optional depending on the context:

Name Description
id The (per bucket) unique identifier of the document.
content The actual content of the document.
cas The CAS (Compare And Swap) value of the document.
expiry The expiration time of the document.
mutationToken The optional MutationToken after a mutation.
There are a few different implementations of a Document. Here are a few noteworthy document types:
  • JsonDocument: The default one in most methods, contains a JSON object (as a JsonObject).
  • RawJsonDocument: Represents any JSON value, stored as a String (useful for when you have your own JSON serializer/deserializer).
  • BinaryDocument: Used to store pure raw binary data (as a ByteBuf from Netty).
    Important: The ByteBuf comes from Netty, and when reading one from the SDK, you need to manage its memory by hand by calling release(). See the section about binary documents.

Because Couchbase Server can store anything and not just JSON files, many document types exist to satisfy the general needs of an application. You can also write your own Document implementations, which is not covered in this introduction.

Creating and Updating Full Documents

Documents may be created and updated using the Bucket#upsert(), Bucket#insert(), and Bucket#replace() family of methods. Read more about the difference between these methods at Primitive Key-Value Operations in the Couchbase developer guide.

These methods accept a Document instance where the following values are considered if set:
  • id (mandatory): The ID of the document to modify (String).
  • content (mandatory): The desired new content of the document, this varies per document type used. If the JsonDocument is used, the document type is a JsonObject.
  • expiry (optional): Specify the expiry time for the document. If specified, the document will expire and no longer exist after the given number of seconds. See ../core-operations.html#expiry for more information.
  • cas (optional): The CAS value for the document. If the CAS on the server does not match the CAS supplied to the method, the operation will fail with a CASMismatchException. See Concurrent Document Mutations for more information on the usage of CAS values.
Other optional arguments are also available for more advanced usage:
  • persistTo, replicateTo: Specify durability requirements for the operations.
  • timeout, timeUnit: Specify a custom timeout which overrides the default timeout setting.

Upon success, the returned Document instance will contain the new CAS value of the document. If the document is not mutated successfully, an exception is raised depending on the type of error.

Inserting a document works like this:

JsonDocument doc = JsonDocument.create("document_id", JsonObject.create().put("some", "value"));
System.out.println(bucket.insert(doc));
Output: JsonDocument{id='document_id', cas=216109389250560, expiry=0, content={"some":"value"}, mutationToken=null}

If the same code is called again, a DocumentAlreadyExistsException will be thrown. If you don't care that the document is overridden, you can use upsert instead:

JsonDocument doc = JsonDocument.create("document_id", JsonObject.empty().put("some", "other value"));
System.out.println(bucket.upsert(doc));
Output: JsonDocument{id='document_id', cas=216109392920576, expiry=0, content={"some":"other value"}, mutationToken=null}

Finally, a full document can be replaced if it existed before. If it didn't exist, then a DocumentDoesNotExistException will be thrown:

JsonDocument doc = JsonDocument.create("document_id", JsonObject.empty().put("more", "content"));
System.out.println(bucket.replace(doc));
Output: JsonDocument{id='document_id', cas=216109395083264, expiry=0, content={"more":"content"}, mutationToken=null}

Retrieving full documents

You can retrieve documents using the Bucket#get(), Bucket#getAndLock(), Bucket#getAndTouch() and Bucket#getFromReplica()methods. All of those serve different distinct purposes and accept different parameters.

Most of the time you use the get() method. It accepts one mandatory argument:

  • id: The document ID to retrieve
System.out.println(bucket.get("document_id"));
Output: JsonDocument{id='document_id', cas=216109395083264, expiry=0, content={"more":"content"}, mutationToken=null}

Other overloads are available for advanced purposes:

  • document: Instead of just passing an Id a full document can be passed in. If so, the ID is extracted and used.
  • target: A custom Document type (other than JsonDocument) can be specified.
  • timeout, timeUnit: Specify a custom timeout which overrides the default timeout setting.
// Use a Document where ID is extracted
JsonDocument someDoc = JsonDocument.create("document_id");
System.out.println(bucket.get(someDoc));
Output: JsonDocument{id='document_id', cas=216109395083264, expiry=0, content={"more":"content"}, mutationToken=null}
// A custom Document type, here it returns the plain raw JSON String, encoded.
RawJsonDocument doc = bucket.get("document_id", RawJsonDocument.class);
String content = doc.content();
System.out.println(content);
Output: {"more":"content"}
// Wait only 1 second instead of the default timeout
JsonDocument doc = bucket.get("document_id", 1, TimeUnit.SECONDS);

It is also possible to read from a replica if you want to explicitly trade availability for consistency during the timeframe when the active partition is not reachable (for example during a node failure or netsplit).

getFromReplica has one mandatory argument as well:

  • id: The document ID to retrieve

Since you can have 0 to 3 replicas (and they can change at runtime of your application) the getFromReplica returns Lists or Iterators. It is recommended to use the Iterator APIs since they provide more flexibility during error conditions (since only partial responses may be retreived).

Iterator<JsonDocument> docIter = bucket.getFromReplica("document_id");
while(docIter.hasNext()) {
    JsonDocument replicaDoc = docIter.next();
    System.out.println(replicaDoc);
}

Other overloads are available for advanced purposes:

  • replicaMode: Allows to configure from which replicas to read from (defaults to all).
  • document: Instead of just passing an Id a full document can be passed in. If so, the ID is extracted and used.
  • target: A custom Document type (other than JsonDocument) can be specified.
  • timeout, timeUnit: Specify a custom timeout which overrides the default timeout setting.
Tip: In general, always use the ReplicaMode.ALL option and not ReplicaMode.FIRST and similar to just get the first replica. The reason is that is that ALL will also try the active node, leading to more reliable behavior during failover. If you just need the first replica use the iterator approach and break; once you have enough data from the replicas.
Important: Since a replica is updated asynchronously and eventually consistent, reading from it may return stale and/or outdated results!

If you need to use pessimistic write locking on a document you can use the getAndLock which will retreive the document if it exists and also return its CAS value. You need to provide a time that the document is maximum locked (and the server will unlock it then) if you don't update it with the valid cas. Also note that this is a pure write lock, reading is still allowed.

// Get and Lock for max of 10 seconds
JsonDocument ownedDoc = bucket.getAndLock("document_id", 10);

// Do something with your document
JsonDocument modifiedDoc = modifyDocument(ownedDoc);

// Write it back with the correct CAS
bucket.replace(modifiedDoc);

If the document is locked already and you are trying to lock it again you will receive a TemporaryLockFailureException.

It is also possible to fetch the document and reset its expiration value at the same time. See Modifying Expiration for more information.

Removing full documents

You can remove documents using the Bucket.remove() method. This method takes a single mandatory argument:

  • id: The ID of the document to remove.

Some additional options:

  • persistTo, replicateTo: Specify durability requirements for the operations.
  • timeout, timeUnit: Specify a custom timeout which overrides the default timeout setting.

If the cas value is set on the Document overload, it is used to provide optimistic currency, very much like the replace operation.

// Remove the document
JsonDocument removed = bucket.remove("document_id");
JsonDocument loaded = bucket.get("document_id");

// Remove and take the CAS into account
JsonDocument removed = bucket.remove(loaded);

Modifying expiration

Many methods support setting the expiry value as part of their other primary operations:

  • Bucket#touch: Resets the expiry time for the given document ID to the value provided.
  • Bucket#getAndTouch: Fetches the document and resets the expiry to the given value provided.
  • Bucket#insert, Bucket#upsert, Bucket#replace: Stores the expiry value alongside the actual mutation when set on the Document instance.

The following example stores a document with an expiry, waits a bit longer and as a result no document is found on the subsequent get:

int expiry = 2; // seconds
JsonDocument stored = bucket.upsert(
    JsonDocument.create("expires", expiry, JsonObject.create().put("some", "value"))
);

Thread.sleep(3000);

System.out.println(bucket.get("expires"));
null
You may also use the Bucket#touch() method to modify expiration without fetching or modifying the document:
bucket.touch("expires", 2);

Atomic Document Modifications

Additional atomic document modifications can be performing using the Java SDK. You can modify a counter document using the Bucket.counter() method. You can also use the Bucket.append() and Bucket.prepend() methods to perform raw byte concatenation.

Batching Operations

Since the Java SDK uses RxJava as its asynchronous foundation, all operations can be batched in the SDK using the asynchronous API via bucket.async() (and optionally revert back to blocking).

For implicit batching use these operators: Observable.just() or Observable.from() to generate an observable that contains the data you want to batch on. flatMap() to send those events against the Couchbase Java SDK and merge the results asynchronously. last() if you want to wait until the last element of the batch is received. toList() if you care about the responses and want to aggregate them easily. If you have more than one subscriber, use cache() to prevent accessing the network over and over again with every subscribe.

The following example creates an observable stream of 6 keys to load in a batch, asynchronously fires off get() requests against the SDK (notice the bucket.async().get(...)), waits until the last result has arrived, and then converts the result into a list and blocks at the very end. This pattern can be reused for mutations like upsert (as shown further down):

Cluster cluster = CouchbaseCluster.create();
Bucket bucket = cluster.openBucket();

List<JsonDocument> foundDocs = Observable
    .just("key1", "key2", "key3", "key4", "inexistentDoc", "key5")
    .flatMap(new Func1<String, Observable<JsonDocument>>() {
        @Override
        public Observable<JsonDocument> call(String id) {
            return bucket.async().get(id);
        }
    })
    .toList()
    .toBlocking()
    .single();

for (JsonDocument doc : foundDocs) {
    System.out.println(doc.id());
}
key1
key2
key3
key4
key5

Note that this always returns a list, but it may contain 0 to 6 documents (here 5) depending on how many are actually found. Also, at the very end the observable is converted into a blocking one, but everything before that, including the network calls and the aggregation, is happening completely asynchronously.

Inside the SDK, this provides much more efficient resource utilization because the requests are very quickly stored in the internal Request RingBuffer and the I/O threads are able to pick batches as large as they can. Afterward, whatever server returns a result first it is stored in the list, so there is no serialization of responses going on.

Batching mutations: The previous Java SDK only provided bulk operations for get(). With the techniques shown above, you can perform any kind of operation as a batch operation. The following code generates a number of fake documents and inserts them in one batch. Note that you can decide to either collect the results with toList() as shown above or just use last() as shown here to wait until the last document is properly inserted:

// Generate a number of dummy JSON documents
int docsToCreate = 100;
List<JsonDocument> documents = new ArrayList<JsonDocument>();
for (int i = 0; i < docsToCreate; i++) {
    JsonObject content = JsonObject.create()
        .put("counter", i)
        .put("name", "Foo Bar");
    documents.add(JsonDocument.create("doc-"+i, content));
}

// Insert them in one batch, waiting until the last one is done.
Observable
    .from(documents)
    .flatMap(new Func1<JsonDocument, Observable<JsonDocument>>() {
        @Override
        public Observable<JsonDocument> call(final JsonDocument docToInsert) {
            return bucket.async().insert(docToInsert);
        }
    })
    .last()
    .toBlocking()
    .single();

Operating with Sub-Documents

Tip: Sub-Document API is available starting Couchbase Server version 4.5. See Sub-Document Operations for an overview.

Sub-document operations save network bandwidth by allowing you to specify paths of a document to be retrieved or updated. The document is parsed on the server and only the relevant sections (indicated by paths) are transferred between client and server. You can execute sub-document operations in the Java SDK using the Bucket#lookupIn() and Bucket#mutateIn() methods.

Each of these methods accepts a key as its mandatory first argument and give you a builder that you can use to chain several command specifications, each specifying the path to be impacted by the specified operation and a document field operand. You may find all the operations in the LookupInBuilder and MutateInBuilder classes.

bucket.lookupIn("docid")
    .get("path.to.get")
    .exists("check.path.exists")
    .execute();

boolean createParents = true;
bucket.mutateIn("docid")
    .upsert("path.to.upsert", value, createParents)
    .remove("path.to.del"))
    .execute();

All sub-document operations return a special DocumentFragment object rather than a Document. It shares the id(), cas() and mutationToken() fields of a document, but in contrast with a normal Document object, a DocumentFragment object contains multiple results with multiple statuses, one result/status pair for every input operation. So it exposes method to get the content() and status() of each spec, either by index or by path. It also allows to check that a response for a particular spec exists():

DocumentFragment<Lookup> res =
bucket.lookupIn("docid")
    .get("foo")
    .exists("bar")
    .exists("baz")
    .execute();

// First result
res.content("foo");
// or
res.content(0);

Using the content(...) methods will raise an exception if the individual spec did not complete successfully. You can also use the status(...) methods to return an error code (a ResponseStatus) rather than throw an exception.

Formats and Non-JSON Documents

Tip: See Non-JSON Documents for a general overview of using non-JSON documents with Couchbase
The Java SDK defines several concrete implementations of a Document to represent the various data types that it can store. Here is the complete list of document types:
Table 1. Documents with JSON content
Document Name Description
JsonDocument The default, which has a JsonObject at the top level content.
RawJsonDocument Stores any JSON value and should be used if custom JSON serializers such as Jackson or GSON are already in use.
JsonArrayDocument Similar to JsonDocument, but has a JsonArray at the top level content.
JsonBooleanDocument Stores JSON-compatible Boolean values.
JsonLongDocument Stores JSON compatible long (number) values.
JsonDoubleDocument Stores JSON compatible double (number) values.
JsonStringDocument Stores JSON compatible String values. Input is automatically wrapped with quotes when stored.
EntityDocument Used with the Repository implementation to write and read POJOs into JSON and back.
Table 2. Documents with other content
Document Name Description
BinaryDocument Can be used to store arbitrary binary data.
SerializableDocument Stores objects that implement Serializable through default Java object serialization.
LegacyDocument Uses the Transcoder from the 1.x SDKs and can be used for full cross-compatibility between the old and new versions.
StringDocument Can be used to store arbitrary strings. They will not be quoted, but stored as-is and flagged as "String".

You can implement a custom document type and associated transcoder if none of the pre-configured options are suitable for your application. A custom transcoder converts intputs to their serialized forms, and deserializes encoded data based on the item flags. There is an AbstractTranscoder that can serve as the basis for a custom implementation, and custom transcoders should be registered with a Bucket when calling Cluster#openBucket (a list of custom transcoders can be passed in one of the overloads).

Correctly Managing BinaryDocuments

The BinaryDocument can be used to store and read arbitrary bytes. It is the only default codec that directly exposes the underlying low-level Netty ByteBuf objects.

Important: Because the raw data is exposed, it is important to free it after it has been properly used. Not freeing it will result in increased garbage collection and memory leaks and should be avoided by all means. See Correctly Managing Buffers.

Because binary data is arbitrary anyway, it is backward compatible with the old SDK regarding flags so that it can be read and written back and forth. Make sure it is not compressed in the old SDK and that the same encoding and decoding process is used on the application side to avoid data corruption.

Here is some demo code that shows how to write and read raw data. The example writes binary data, reads it back, and then frees the pooled resources:

// Create buffer out of a string
ByteBuf toWrite = Unpooled.copiedBuffer("Hello World", CharsetUtil.UTF_8);

// Write it
bucket.upsert(BinaryDocument.create("binaryDoc", toWrite));

// Read it back
BinaryDocument read = bucket.get("binaryDoc", BinaryDocument.class);

// Print it
System.out.println(read.content().toString(CharsetUtil.UTF_8));

// Free the resources
ReferenceCountUtil.release(read.content());

Correctly Managing Buffers

BinaryDocument allows users to get the rawest form of data out of Couchbase. It exposes Netty's ByteBuf, byte buffers that can have various characteristics (on- or off-heap, pooled or unpooled). In general, buffers created by the SDK are pooled and off heap. You can disable the pooling in the CouchbaseEnvironment if you absolutely need that.

As a consequence, the memory associated with the ByteBuf must be a little bit more managed by the developer than usual in Java.

Most notably, these byte buffers are reference counted, and you need to know three main methods associated to buffer management:
  • refCnt() gives you the current reference count. When it hits 0, the buffer is released back to its original pool, and it cannot be used anymore.
  • release() will decrease the reference count by 1 (by default).
  • retain() is the inverse of release, allowing you to prepare for multiple consumptions by external methods that you know will each release the buffer.

You can also use ReferenceCountUtil.release(something) if you don't want to check if something is actually a ByteBuf (will do nothing if it's not something that is ReferenceCounted).

Important: The SDK bundles the Netty dependency into a different package so that it doesn't clash with a dependency to another version of Netty you may have. As such, you need to use the classes and packages provided by the SDK ( com.couchbase.client.deps.io.netty) when interacting with the API. For example, the ByteBuf for the content of a BinaryDocument is a com.couchbase.client.deps.io.netty.buffer.ByteBuf.

What happens if I don't release?

Basically, you leak memory... Netty will by default inspect a small percentage of ByteBuf creations and usage to try and detect leaks (in which case it will output a log, look for the "LEAK" keyword).

You can tune that to be more eagerly monitoring all buffers by calling ResourceLeakDetector.setLevel(PARANOID).
Important: Note that this incurs quite an overhead and should only be activated in tests. In production (prod), setting it to ADVANCED is not as heavy as paranoid and can be a good middle ground.

What happens if I release twice (or the SDK releases once more after I do)?

Netty will throw IllegalReferenceCountException. The buffer that has RefCnt = 0 cannot be interacted with anymore since it means it has been freed back into the pool.

When must I release?

When the SDK creates a BinaryDocument for you, basically GET-type operations.

Mutative operations, on the other hand, will take care of the buffer you pass in for you, at the time the buffer is written on the wire.

When must I usually retain?

When you do a write, the buffer will usually be released by the SDK calling release(). But if you implement a kind of fallback behavior (for instance attempt to insert() a doc, catch DocumentAlreadyExistException and then fallback to an update() instead), that means the SDK would attempt to release twice, which won't work.

In this case you can retain() the buffer before the first attempt, let the catch block do the extra release if something goes wrong. You have to manage the extra release if the first write succeeds, and think about catching other possible exceptions (here also an extra release is needed):

byteBuffer.retain(); //prepare for potential multi usage (+1 refCnt, refCnt = 2)
try {
   bucket.append(document);
   // refCnt = 2 on success
   byteBuffer.release(); //refCnt = 1
} catch (DocumentDoesNotExistException dneException) {
   // buffer is released on errors, refCnt = 1
   //second usage will also release, but we want to be at refCnt = 1 for the finally block
   byteBuffer.retain(); //refCnt = 2
   bucket.insert(document); //refCnt = 1
} // other uncaught errors will still cause refCnt to be released down to 1
finally {
   //we made sure that at this point refCnt = 1 in any case (success, caught exception, uncaught exception)
   byteBuffer.release(); //refCnt = 0, returned to the pool
}