The primary considerations for sizing a Couchbase Server cluster are the number of nodes and node size.
When sizing your Couchbase Server cluster, ask the following questions:
- How many nodes do I need?
- How large (RAM, CPU, disk space) must those nodes be?
To determine the number of nodes needed for a cluster, consider the following:
- Disk throughput and sizing
- Network bandwidth
- Data distribution and safety
Due to the in-memory nature of Couchbase Server, RAM is usually the determining factor for sizing. However, the primary sizing factor depends on the actual data set and the information being stored. For example:
- If you have a very small data set with a very high load, you must calculate the size based more on the network bandwidth than RAM.
- If you have a very high write rate, you’ll need more nodes to support the disk throughput that is needed to persist all that data (and likely more RAM to buffer the incoming writes).
- Even with a very small dataset under low load, you might want three nodes for proper distribution and safety.
You can increase the capacity of your cluster (RAM, disk, CPU, or network) by increasing the number of nodes within the cluster. Each limit will be increased linearly as the cluster size is increased.
- RAM sizing
- RAM is usually the most critical sizing parameter. It is also the one that can have the most impact on performance and stability.
- Working set
- The working set is the data that the client application actively uses at any point in time. Ideally, all of the working set lives in memory, which impacts the amount of needed memory.
- Memory quota
- It is very important that your Couchbase cluster’s size corresponds to the working set size and total data you expect. The goal is to size the available RAM so that all your document IDs, the document ID meta data, and the working set values fit. The memory must rest just below the point at which Couchbase Server will start evicting values to disk (the High Water Mark).
How much memory and disk space per node you will need depends on several different variables.
The following are per-bucket calculations and must be summed up across all buckets. If all the buckets have the same configuration, treat the total data as a single bucket. There is no per-bucket overhead that needs to be considered.
|documents_num||The total number of documents you expect in your working set|
|ID_size||The average size of document IDs|
|value_size||The average size of values|
|number_of_replicas||The number of copies of the original data you want to keep|
|working_set_percentage||The percentage of your data you want in memory|
|per_node_ram_quota||How much RAM can be assigned to Couchbase|
Use the following items to calculate how much memory you need:
|Metadata per document (metadata_per_document)||This is the amount of memory that Couchbase needs to store metadata per document. Metadata uses 56 bytes. All the metadata needs to live in memory while a node is running and serving data.|
|SSD or Spinning||SSDs give better I/O performance.|
|headroom||The cluster needs additional overhead to store metadata called headroom , which requires approximately 25–30% more space than the raw RAM requirements for your dataset. Since SSDs are faster than spinning (traditional) hard disks, you should set aside 25% of memory for SSDs and 30% of memory for spinning hard disks.|
|High Water Mark (high_water_mark)||By default, the high water mark for a node’s RAM is set at 85%.|
The rough guidelines to size your cluster are as follows:
|no_of_copies||1 + number_of_replicas|
|total_metadata . All the documents need to live in the memory.||(documents_num) * (metadata_per_document + ID_size) * (no_of_copies)|
|total_dataset||(documents_num) * (value_size) * (no_of_copies)|
|working_set||total_dataset * (working_set_percentage)|
|Cluster RAM quota required||(total_metadata + working_set) * (1 + headroom) / (high_water_mark)|
|number of nodes||Cluster RAM quota required / per_node_ram_quota|
The following is a sample sizing calculation:
|Type of Storage||SSD|
|metadata_per_document||56 for 2.1 and higher, 64 for 2.0.x|
|no_of_copies||= 1 for original and 1 for replica|
|total_metadata||= 1,000,000 * (100 + 56) * (2) = 312,000,000|
|total_dataset||= 1,000,000 * (10,000) * (2) = 20,000,000,000|
|working_set||= 20,000,000,000 * (0.2) = 4,000,000,000|
|Cluster RAM quota required||= (312,000,000 + 4,000,000,000) * (1+0.25)/(0.85) = 6,341,176,470|
For example, if you have 8GB machines and you want to use 6 GB for Couchbase:
number of nodes = Cluster RAM quota required/per_node_ram_quota = 7.9 GB/6GB = 1.3 or 2 nodes
Network bandwidth is not normally a significant factor to consider for cluster sizing. However, clients require network bandwidth to access information in the cluster. Nodes also need network bandwidth to exchange information (node to node).
In general, calculate your network bandwidth requirements using the following formula:
Bandwidth = (operations per second * item size) + overhead for rebalancing
Calculate the operations per second with the following formula:
Operations per second = Application reads + (Application writes * Replica copies)
Make sure you have enough nodes (and the right configuration) in your cluster to keep your data safe. There are two areas to keep in mind: how you distribute data across nodes and how many replicas you store across your cluster.
Data distributionMore nodes are better than less. If you only have two nodes, your data is split across the two nodes, and half of your dataset is impacted if one goes away. On the other hand, with ten nodes, only 10% of the dataset is impacted if one node goes away. Even with automatic failover, there still is a period when data is unavailable if nodes fail, which is mitigated by having more nodes.
After a failover, the cluster takes on an extra load. The question is - how heavy is that extra load and are you prepared for it? Again, with only two nodes, each one needs to be ready to handle the entire load. With ten, each node only needs to be able to take on an extra tenth of the workload should one fail.
While two nodes does provide a minimal level of redundancy, we recommend that you always use at least three nodes.
Couchbase Server authorizes you to configure up to three replicas (creating four copies of the dataset). In the event of a failure, you can only “fail over” (either manually or automatically) as many nodes as you have replicas. For example:
- In a five node cluster with one replica, if one node goes down, you can fail it over. If a second node goes down, you no longer have enough replica copies to fail over to and will have to go through a slower process to recover.
- In a five node cluster with two replicas, if one node goes down, you can fail it over. If a second node goes down, you can fail it over as well. Should a third one go down, you now no longer have replicas to fail over.
After a node goes down and is failed over, try to replace that node as soon as possible and rebalance. The rebalance recreates the replica copies (if you still have enough nodes to do so).
- One replica for up to five nodes.
- One or two replicas for five to ten nodes.
- One, two, or three replicas for over ten nodes.
In general, Couchbase Server has very low hardware requirements and is designed to be run on commodity or virtualized systems. However, as a rough guide to the primary concerns for your servers, the following is recommended:
- RAM: This is your primary consideration. We use RAM to store active items, and that is the key reason Couchbase Server has such low latency.
- CPU: Couchbase Server has very low CPU requirements. The server is multi-threaded and, therefore, benefits from a multi-core system. We recommend machines with at least four or eight physical cores.
- Disk: By decoupling the RAM from the I/O layer, Couchbase Server can support low-performance disks better than other databases. As a best practice, have separate devices for server install, data directories, and index directories.
- Network: Most configurations work with Gigabit Ethernet interfaces. Faster solutions such as 10GBit and Infiniband will provide spare capacity.
Known working configurations include SAN, SAS, SATA, SSD, and EBS with the following recommendations:
- SSDs have been shown to provide a great performance boost both in terms of draining the write queue and also in restoring data from disk (either on cold-boot or for purposes of rebalancing).
- RAID provides better throughput and reliability.
- Striping across EBS volumes (in Amazon EC2) has been shown to increase throughput.
Considerations for Cloud environments (that is Amazon EC2)
Due to the unreliability and general lack of consistent I/O performance in cloud environments, you should lower the per-node RAM footprint and increase the number of nodes. By doing that, you can achieve a better disk throughput and improve rebalancing since each node will have to store (and transmit) less data. When you further distribute the data, the impact of losing a single node will be diminished.