Data Durability refers to the fault tolerance and persistence of data in the face of software or hardware failure. Even the most reliable software and hardware might fail at some point, introducing the possibility of data loss. This page discusses what happens to data when a server node fails, and what steps you can take to ensure your data remains durable, safeguarding it against potential data loss.
Couchbase Server’s architecture guards against most forms of failure and protects against data loss. Buckets can be configured for replication, which makes additional copies of the data redundant and allows for the failure of copies: as long as the data is available somewhere, it is not lost.
Data is also written to the disk, so in the case of a power outage or software crash, data can be retrieved from the disk during recovery.
By default, Couchbase does not immediately replicate or write data to disk. After an item is modified, Couchbase immediately stores these modifications in memory on the node that is active for the given item. It then places the item inside a disk queue and a replication queue. The disk queue is served by another thread that persists the change to the disk, and the replication queue is served by another thread that replicates the change to any replicas within the cluster. This is done to improve performance, because modifying an item in memory is typically very quick, whereas sending an item over the network or writing it to the disk may take longer. In most cases items are replicated or persisted within a few milliseconds although the operations may take longer if the storage medium is slow or the network is congested.
If a node fails, any modifications in its queues are lost, and when it recovers it will only contain modifications that had already been stored to disk. If the failed node is failed over, a replica node (more precisely, a replica for the given vBucket) will take it over, but will only know about the modifications that were successfully replicated to it: any modifications in the failed node’s replication queue are lost.
As mentioned before, modifications to items are done synchronously in memory and asynchronously on the disk and network. The synchronous modification in memory involves modifying the item and sending a success response to the client. This tells your application that the item has successfully been stored/modified, and allows your application to continue operating.
Applications may specify requirements for durability to the Couchbase client to wait until the item has also been persisted and replicated. This ensures greater durability of your data against failure in certain edge cases, such as a slow network or storage media. Operations performed using the durability requirement options will appear as failures to the application if the modification could not be successfully stored to disk and/or replicated to replica nodes. Upon receiving such a failure response, the application may retry the operation or mark it as a failure. In any event, the application will never be under the mistaken assumption that the modification was performed when in fact it was lost inside one of the aforementioned queues.
Durability requirements may be present as an extra option to mutation operations (such as upsert). The level of durability is configurable depending on the needs of the application.
- The replication factor of the operation: how many replicas must this operation be propagated to. This is specified as the ReplicateTo option in most SDKs.
- The persistence of the modification: how many persisted copies of the modified record must exist. Often found as a PersistTo option.
- Ensure the modification is stored (on disk) on the active (master) node.
- Ensure the modification exists (in memory) on at least one replica.
- Ensure the modification is stored (on disk) on at least one replica.
- Ensure the modification exists (in memory) on all replicas.
- Ensure the modification is stored (on disk) on the active node, and exists (in memory) on all replicas.
- Ensure the modification is stored (on disk) on the active node, and on all replicas.
When an operation is propagated to a replica, it is first propagated to the replica’s memory and then placed in a queue (as above) to be persisted to the replica’s disk. You can control whether you desire the operation to only be propagated to a replica’s memory (replicate to), or whether to wait until it is fully persisted to the replica’s disk (persist to).
persist to receives a value total to the number of nodes that an operation should be persisted. The maximum value is the number of replica nodes plus one. A value of 1 indicates master-only persistence (and no replica persistence).
replicate to receives a value total to the number of replica nodes the operation should be replicated. Its maximum value is the number of replica nodes.
Note that Couchbase Server’s design of asynchronous persistence and replication is there for a reason: the time it takes to store an item in memory is several orders of magnitude quicker than persisting it to disk or sending it over the network. Operation times and latencies will increase when waiting on the speed of the network and the speed of the disk.
A common requirement is that an operation should be propagated to at least one additional location before continuing. An additional copy can either be in the master’s storage (master persistence), or in the memory of a replica node. Depending on your environment either persistence or replication may be quicker: replication will be quicker on a 10GE network with commodity magnetic drives, but persistence will be quicker on a 1GE network with SSD drives.
Durability operations may also cause increased network traffic since it is implemented at the client level by sending special probes to each server once the actual modification has completed.
Enhanced DurabilityStarting with version 4.0, Couchbase Server offers additional durability options which provide clearer semantics in the face of failover.
Prior to Couchbase Server 4.0, durability was only implemented using the CAS. Since the CAS changed for each mutation, a client could normally determine if the mutation was persisted on a given node if the item existed on the node with the same CAS.
The core issue with CAS-based checking is that a different CAS on a replica may either be a result of:
- Client contention: The CAS is different from the current mutation because it been changed by a subsequent mutation (one performed after our current client)
- Slow propagation: The mutation has not yet been propagated to all replicas
- Failover: The CAS has changed because the master was failed over and control has been transferred to a replica which is using an older copy.
- A random identifier representing (called a UUID) the current cluster state. This state is changed whenever something happens to the cluster (for example, a failover).
- A counter value which is associated with the current state.
Applications may enable this extra sequence data to be sent for each mutation response. This data is then transparently used by the SDK to use sequence number-based durability checking.
- If the UUID is equal, but the sequence number is lower than the expected value, then it means the mutation has not yet been propagated.
- If the UUID is equal, and the sequence number is greater or equal to the current mutation, it means the current mutation has been propagated.
- If the UUID is different than the one available at the client, it means the mutation has been lost.