Batching Operations
Edit this article in GitHub
Version 2.3

Batching Operations

Batching operations allows you to make better utilization of your network and speed up your application by increasing network throughput and reducing latency. Batched operations work by pipelining requests over the network. When requests are pipelined, they are sent in one large group to the cluster. The cluster in turn pipelines responses back to the client. When operations are batched, there are fewer IP packets to be sent over the network (since there are fewer individual TCP segments).

You can batch operations via the Couchbase SDK. Batching operations is done either via explicit multi methods, or by scheduling multiple operations in sequence.
Note: The availability of multi APIs or the ability to explicitly schedule and wait for operations is dependent on language and SDK features. Refer to the SDK documentation for more specifics.

Asynchronous batching

Asynchronous clients inherently batch operations: because the application receives the response at a later stage in the application, it may attain batching via issuing many requests in sequence. When using an SDK in an asynchronous (non-blocking) model, all requests are inherently batched.

Performance benefits

IP packets and TCP segments carry their own overhead, so no matter how small the contents, there is still a 20 byte overhead for IP, a 10 byte overhead for TCP, and a 26 byte overhead for Ethernet for each TCP packet; making the total overhead for each request close to 60 bytes. When multiple requests are batched into a single IP header or TCP segment, the IP/TCP overhead is incurred only when a new TCP segment (or IP packet) is actually needed to accommodate more requests.

Reducing the number of packets is also more efficient for the CPU, as it requires fewer system calls to send/receive data over the network: Since the data is in a single buffer, it can be sent at once using a single system call and its responses received with a single system call. When using non-batched operations, each request and response will need to make its own system call.

Due to Couchbase’ distributed architecture, multi-item access may result in multiple servers being contacted. For instance, when accessing 15 items on a 4 node cluster, roughly 3 items will exist on each node. When operations are batched, the SDK can efficiently parallelize requests to each involved server, allowing all items to be access quicker. On a per-server basis, batching requests allows each individual node to fetch relevant items in parallel from its memory.

Error handling

Operation batching is done purely on the client and network. Couchbase does not have a concept of transactional batched operations - in which all operations either succeed or fail. As such, some operations may succeed and some may fail when batching commands.

Consult your SDK documentation on how to distinguish successes and failures in batched operation.

Batching guidelines

Batching improves network utilization. However there is a batching threshold at which the maximum network efficiency is gained - and batching beyond this amount will simply increase memory and CPU usage on the client, and in some cases cause operations to prematurely time-out or otherwise fail.

As a guideline, applications should batch no more than 1MB before sending to the server. Calculating the 1 MB value is dependent on the length of the key and value (where applicable) of each operation.

Note that this is just a guideline. The limit may be higher for extremely efficient networks (such as 10-gigabit Ethernet). It is recommended you benchmark your network to get ideal performance numbers. The limit may be lower for congested networks or slow server nodes (for example, a shared development VM with low resources).

The [cbc-pillowfight] utility may be used to appropriately determine the correct batch size for a cluster.

When calculating the batch size, also consider that each operation has a 24 byte overhead at the protocol level:

Sizing batches: examples

When storing items, with each key being approximately 10 bytes long and each value being approximately 4000 bytes long, estimate the following:
  1. Calculate Bytes per operation:
    • 10 (Key)
    • 4000 (Value)
    • 24 (Memcached Packet)
    • Total: 4034.
  2. Divide 1 megabyte by the total size of an operation:
    • 1048576 / 4034
    • Batch Size: 259
The 24 byte overhead becomes more evident when dealing with smaller values. Assuming an average key size of 5 and an average value size of 50:
  1. Calculate bytes per operation:
    • 5 (Key)
    • 50 (value)
    • 24 (Packet)
    • Total: 74
  2. Divide 1 megabyte by the total size of an operation:
    • Batch Size: 14169

Batching operations using SDK

The following example is written in Python, and demonstrates properly splitting operations for batch sizes. It uses a 256K batch size and handles errors:

#!/usr/bin/env python
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseTransientError

cb = Bucket('couchbase://')

BYTES_PER_BATCH = 1024 * 256 # 256K

# Generate our data:
all_data = {}
for x in xrange(100000):
    key = "Key_" + str(x)
    value = { 'value': x }
    all_data[key] = value

batches = []
cur_batch = {}
cur_size = 0

for key, value in all_data.items():
    cur_batch[key] = value
    cur_size += len(key) + len(value) + 24
    if cur_size > BYTES_PER_BATCH:
        cur_batch = {}
        cur_size = 0

print "Have {} batches".format(len(batches))
num_completed = 0
while batches:
    batch = batches[-1]
        num_completed += len(batch)
    except CouchbaseTransientError as e:
        print e
        ok, fail = e.split_results()
        new_batch = {}
        for key in fail:
            new_batch[key] = all_data[key]
        num_completed += len(ok)
        print "Retrying {}/{} items".format(len(new_batch), len(ok))

    print "Completed {}/{} items".format(num_completed, len(all_data))