Rebalancing factors

Rebalancing factors

Choosing when, why, and how to rebalance your cluster depends on the scenario.

Choosing when each of situations applies is not always straightforward and various indicators define when to change the node configuration and when to perform a rebalance.

When to expand your cluster

You can increase the size of your cluster by adding more nodes. Adding more nodes increases the available RAM, disk I/O and network bandwidth available to your client applications and helps to spread the load accross more machines. There are a few different metrics and statistics you can use to make your decision:

Increase RAM capacity
One of the most important components in a Couchbase Server cluster is the amount of RAM available. RAM not only stores application data and supports the Couchbase Server caching layer, it is also actively used for other operations by the server. Reduction in the overall available RAM may cause performance problems elsewhere. The following are common indicators for increasing RAM capacity within the cluster:
  • If you see more disk fetches occurring, that means that your application is requesting more and more data from disk that is not available in RAM. Increasing the RAM in a cluster will allow it to store more data and provide better performance to your application.
  • If you want to add more buckets to the Couchbase Server cluster, you might need more RAM to do so. Adding nodes will increase the overall capacity of the system and then you can shrink any existing buckets in order to make room for new ones.
Increase disk I/O throughput
By adding nodes to the Couchbase Server cluster, you will increase the aggregate amount of disk I/O that can be performed across the cluster. This is especially important in high-write environments, but can also be a factor when you need to read large amounts of data from the disk.
Increase disk capacity
You can either add more disk space to the current nodes, or add more nodes to add aggregate disk space to the cluster.
Increase network bandwidth
If you see that you are at the point, or are close to the point, of saturating the cluster network bandwidth, it is a very strong indicator that you need more nodes. More nodes will cause the overall required network bandwidth to be spread out across these additional nodes, which will reduce the individual bandwidth of each node.

When to shrink your cluster

Choosing to shrink a Couchbase cluster is a more subjective decision. It is usually based upon cost considerations, or a change in application requirements that doesn't require such a large cluster to support the required load.

When choosing whether to shrink a cluster:

  • Ensure that you have enough capacity in the remaining nodes to support your dataset and application load. Removing nodes may have a significant detrimental effect on your cluster if there are not enough nodes.
  • Avoid removing multiple nodes at once if you are trying to determine the ideal cluster size. Instead, remove each node one at a time to understand the impact on the cluster as a whole.
  • Remove and rebalance a node rather than using failover. When a node fails and is not coming back to the cluster, the failover functionality will promote its replica vBuckets to become active immediately. If a healthy node is failed over, there might be some data loss for the replication data that was in flight during that operation. Using the remove functionality will ensure that all data is properly replicated and continuously available.

When to rebalance your cluster

Once you decide to add or remove nodes, consider the following:

  • If you are planning on adding or removing multiple nodes in a short period of time, it is best to add them all at once and then kick-off the rebalancing operation rather than rebalance after each addition. This will reduce the overall load placed on the system as well as the amount of data that needs to be moved.
  • Choose a quiet time for adding nodes. While the rebalancing operation is meant to be performed online, it is not a “free” operation and will undoubtedly put increased load on the system as a whole in the form of disk I/O, network bandwidth, CPU resources, and RAM usage.
  • Voluntary rebalancing (that is, not part of a failover situation) should be performed during a period of low usage of the system. Rebalancing is a comparatively resource intensive operation as the data is redistributed around the cluster and you should avoid performing a rebalance during heavy usage periods to avoid having a detrimental affect on overall cluster performance.
  • Rebalancing requires moving large amounts of data around the cluster. The more available RAM will allow the operating system to cache more disk access, which will allow it to perform the rebalancing operation much faster. If there is not enough memory in your cluster, rebalancing may be very slow. It is recommended that you don’t wait for your cluster to reach full capacity before adding new nodes and rebalancing.