vBuckets and vBucket Maps: Bucket Partitions

vBuckets and vBucket Maps: Bucket Partitions

vBuckets help distribute data effectively across a cluster and support replicas on more than one node.

A vBucket is the owner of a subset of the key disk space of a Couchbase cluster. Although vBuckets are not user-accessible components, they are a critical component of the Couchbase Server and are vital to support high availability and elasticity.

You can access the information stored in a bucket by communicating directly with the node responsible for the corresponding vBucket. This direct access enables clients to communicate with the node storing the data, rather than using a proxy or redistribution architecture. The result abstracts the physical topology from the logical partitioning of data, giving Couchbase Server its elasticity and flexibility.

Every document ID belongs to a vBucket. A mapping function is used to calculate the vBucket in which a given document belongs. In Couchbase Server, that mapping function is a hashing function that takes a document ID as input and generates a vBucket identifier as the output. After the vBucket identifier is computed, a table is consulted to lookup the server that "hosts" that vBucket. The table containing one row per vBucket provides a pairing between the vBucket and its hosting server. A server appearing in this table can be responsible for multiple vBuckets.

Consider a scenario where a cluster contains three servers. The following diagram shows how the Key to Server mapping (vBucket mapping) works when a client looks up the value of KEY using the GET operation.
Figure 1. vBucket mapping using the GET operation

  1. By hashing the key, the client calculates the vBucket which owns KEY. In this example, the hash resolves to vBucket 8 (vB8).
  2. The client examines the vBucket map to determine that Server C hosts vB8.
  3. The client sends the GET operation directly to Server C.
Consider a second scenario where a server is added to the original cluster of three servers. After adding a new node, Server D, to the cluster, the vBucket map is updated during the rebalance operation. The updated map is then sent to all the cluster participants including other nodes, any connected smart clients, and the Moxi proxy service. The following diagram shows the vBucket mapping for the updated cluster containing four nodes.
Figure 2. vBucket mapping using the GET operation

When a client looks up the value of KEY using the GET operation in the updated cluster, the hashing algorithm still resolves to vBucket 8 (vB8). However, the new vBucket map maps vB8 to Server D. The client then sends the GET operation directly to Server D.
Note: This architecture enables Couchbase Server to cope with changes without using the typical RDBMS sharding method.

Additionally, this architecture is different from the method used by Memcached as it uses client-side key hashes to determine the server from a defined list. The memcached method, on the other hand, requires active management of the list of servers and specific hashing algorithms such as Ketama to cope with changes to the topology.