About keys, values and metadata

About keys, values and metadata

Earlier we briefly described how Couchbase Server stores information; the server stores all data as key-value pairs. A value can be a string, image, integers, or serialized objects, and valid JSON documents. In general, the Couchbase Server does not attempt to interpret any structure for the value you provide.

Document IDs

Document IDs are assigned by application. A valid document ID must:

  • Conform to UTF-8 encoding
  • Be no longer than 250 bytes
    Note: Note the difference between bytes and characters. Most non-Latin characters occupy more than a single byte
You are free to choose any ID for your document, so long as they conform to the above restrictions. Unlike some other database, Couchbase does not automatically generate IDs for you (but see [counter pattern]).

Specifying keys

Keys are unique identifiers that you provide as a parameter when you perform any operation on data. Each document you store in a data bucket must have a unique document ID, which is similar to the concept of a SQL primary key. The following applies to keys:

  • Keys are strings, typically enclosed by quotes for any given SDK.

  • No spaces are allowed in a key.

  • Separators and identifiers are allowed, such as underscore: ‘person_93847’.

  • A key must be unique within a bucket; if you attempt to store the same key in a bucket, it will either overwrite the value or return an error in the case of add() .

  • Maximum key size is 250 bytes. Couchbase Server stores all keys in RAM and does not remove these keys to free up space in RAM. Take this into consideration when you select keys and key length for your application. However, if you use the full ejection, keys will be ejected too.

Key sizes are important if you consider the size of keys stored for tens or hundreds of millions of records. One hundred million keys which are 70 Bytes each plus metadata at 54 Bytes each will require about 23 GB of RAM for document meta data. As of Couchbase Server 2.0.1, metadata is 60 Bytes and as of Couchbase Server 2.1.0 it is 54 Bytes.

Specifying values

Any value you want to store in Couchbase Server will be stored as a document, or as a pure byte string. In the case of JSON documents, the JSON syntax enables you to provide context and structure for the data. The following applies to values in Couchbase Server:

  • In general, values have no implied meaning when stored in the server.

  • Integers have implicit value for particular operations, namely incrementing and decrementing. This means Couchbase Server recognizes integers as values that can be incremented and decremented.

  • Strings, or serialized objects can be stored.

  • Documents stored in memcached buckets can be up to 1 MB; values stored in Couchbase buckets can be up to 20 MB.

In general it is to your advantage to keep any documents as small as possible; this way, they require less RAM, they will require less network bandwidth, and by using smaller values Couchbase Server can better distribute the information across nodes.

More on metadata

When you store a key-document pair in Couchbase Server, it also saves metadata that is associated with the new record. The following are the types of metadata:

  • Expiration, also known as Time to Live, or TTL.

  • Compare and swap (CAS) value

  • Flags, which are typically SDK-specific and are often used to identify the type of data stored, or to specify formatting.

  • Sequence number, for internal server use only. The sequence number is used for conflict resolution of keys that are updated concurrently on different clusters. This conflict resolution takes place when using Couchbase's cross data center replication (XDCR). The sequence number keeps track of how many times a document is mutated. For more information about XDCR, see the Cross data center replication .

CAS values enable you to store information and then require that a client provide the correct unique CAS value in order to update it. Be aware that performing a function with CAS does slow storing or retrieval. There are some operations that should be fast in nature where you do not want to perform with CAS, for instance append() . For some SDKs a CAS value is nonetheless required to perform the operation. In this case, you can provide 0 as the CAS and the operation will execute without comparing the CAS value. For more information, see “Using Couchbase SDKs.”

Flags are used by SDKs to perform a variety of information- and SDK-specifc function. Typically a Couchbase SDK will use a flag to determine if information should be serialized or formatted in a particular way. For instance, in the case of Java, a flag can signify the data type of an object you are storing. Some SDKs will expose flags for an application to handle; in other SDKs flags may be automatically handled by the SDK itself. For more information about the flags unique to your chosen SDK, please refer to the SDK’s API reference.

Document metadata is 54 Bytes per item as of Couchbase Server 2.1.0 and is 60 Bytes for Couchbase Server 2.0.1. Couchbase Server keeps all document metadata and keys in RAM and does not remove them from RAM to free up additional space. This means 100 million items with a 70-byte key and 54-byte metadata would require approximately 23 GB of RAM at run time.

As discussed previously in this guide, you can provide an explicit expiration for a record or let Couchbase assign a default. The default expiration for any given record is 0, which signifies indefinite storage. Couchbase will keep the item stored until you explicitly perform a delete() on that key. Alternately if you remove the entire bucket, Couchbase will delete the record. Expiration times are typically set in seconds:

  • Items < 30 days: if you want to store an item for thirty days or less, you specify the number of seconds until expiration.

  • Items > 30 days: if you want to store an item for thirty days or more, you specify the an absolute Unix epoch time. Milliseconds will be rounded up to the nearest second. Couchbase Server will delete an item at this time.

If you provide a time to live in seconds that is greater than the number of seconds in 30 days (60 * 60 *24 * 30) Couchbase Server will consider this to be a real Unix epoch time value, rather than interpret it as seconds. It will remove the item at that epoch time.

Understanding document expiration

Time to live can be a bit confusing for developers at first. There are many cases where you may set an expiration to be 30 seconds, but the record may still exist on disk after expiration.

There are two ways that Couchbase Server will remove items flagged for deletion:

  • Lazy Deletion: key are flagged for deletion; after the next request for the key, it will be removed. This applies to data in Couchbase and memcached buckets.

  • Maintenance Intervals: items flagged as expired will be removed by an automatic maintenance process that runs every 60 minutes.

When Couchbase Server performs lazy deletion, it flags an item as deleted when the server receives a delete request; later when a client tries to retrieve the item, Couchbase Server will return a message that the key does not exist and actually delete the item. Items that are flagged as expired will be removed every 60 minutes by default by an automatic maintenance process. To update the interval for this maintenance, you would set exp_pager_stime :

./cbconfig localhost:11210 set flush exp_pager_stime 7200 

This updates the maintenance program so that it runs every two hours on the default bucket.