Upgrade a Couchbase Server cluster in a manner that is best suited to your operations.
Couchbase Server can be upgraded in multiple ways each with their own pros and cons. Which way is best for your cluster will depend on a few factors in your architecture and company organization (e.g. SLAs). On this and subsequent pages of this section we will discuss not just the mechanics of an upgrade, but why each may or may work best for you.
Option #1 - Rolling Online Upgrade
A rolling online upgrade is the recommended upgrade process for a Couchbase Server cluster and should be chosen unless you have special circumstances that would rule it out. As an online upgrade, this method avoids downtime for the database and applications because all data operations continue normally throughout the process. Rolling upgrades involve using the rebalance feature of Couchbase to redistribute the data amongst the nodes in the cluster running the Data Service, thus keeping the data accessible for the duration of the upgrade. As long as you keep some nodes from the Index and Query Services
As Couchbase Server is specifically designed to provide fully online upgrades If you find this is not the case, please report a bug.
There are three options for rolling online upgrades:
- Swap Rebalance
- This method entails introducing new nodes into a Couchbase Server cluster as you remove an equal number of nodes to be upgraded. It uses a feature called Swap Rebalance to move vBuckets efficiently from an existing node running the Data Service to a new node that will run the Data Service, without involving other nodes in the cluster.
- As long as you swap an equal number of nodes (e.g. one for one, two for two, etc.), a Swap Rebalance will be used. This also keeps the cluster capacity consistent so as to not interfere with the load running on the cluster. While this method is the safest and provides the most availability, one downside to this option is the overall duration of a cluster upgrade may be longer as compared to other options. If the speed of an upgrade is a primary concern for your cluster, as it might be if you have a tight maintenance window, see Graceful Failover or Performing the Offline Upgrade.
- Regular Rebalance
- This method is suitable when you must complete the upgrade using only the nodes currently in the cluster. It involves removing at least one running node from the cluster, upgrading the node, and then adding it back in. This upgrade method reduces the capacity of the cluster during the upgrade process.
- Like a swap rebalance upgrade, this style will require multiple rebalances to complete.
- Graceful Failover
- This option involves performing a rolling online upgrade using a Graceful Failover instead of the full addition and removal of nodes in option #1. It is typically faster and less resource intensive because vBuckets do not need to be moved between nodes. With this process, there is a cluster rebalance but it is only to synchronize the vBuckets on the upgraded node when each node returns to the cluster. Another advantage compared to the other online upgrades is that this method preserves the global secondary indexes and doesn’t need to rebuild them.
- The primary downside to this option is increased risk as replicas are used for faster failover and not recreated until a rebalance occurs. This option is not available when choosing to upgrade with net-new systems (as in the case of many cloud deployments) since those new nodes would not have the previous nodes’ data in place. Use Option #1 when upgrading with net-new systems.
Option #2 - Upgrade Using Inter-cluster Replication
For this option, another Couchbase Server cluster is created or already exists and then inter-cluster replication is enabled via the Cross Datacenter Replication (XDCR) built-in feature of Couchbase Server. The administrator then transitions the application to use the new cluster as part of the upgrade process. While this upgrade process is relatively straightforward to set up, it requires more investment in servers/instances and networking, as well as a change to the application so that it uses the newly upgraded cluster. It is best used when an administrator needs an immediate rollback position or does not want to tinker with the original cluster for some reason. This method is recommended when an existing XDCR connection is already in place.
Some people modify their application, to make this can be a zero downtime upgrade by having their application aware of both clusters and can failover between them automatically. If you do this, keep in mind that unlike regular operations against the Data Service, XDCR is eventually consistent. Most people treat this as an offline upgrade and will roll the application over with the new cluster configurations. Which solution is best will depend on your SLA requirements.
Option #3 - Offline Upgrade
Choose an offline upgrade when the situation calls for an easy and fast upgrade method as well as when the database can incur a controlled outage. The offline upgrade is more likely to succeed in situations where an online upgrade option might fail, but also the rare time a cluster is unstable and has been determined that a Couchbase Server upgrade will fix a specific issue.
The offline upgrade is the best choice in situations where an online upgrade option might fail, such as the rare times a cluster is unstable and it has been determined that an upgrade will fix the issue.
Choosing the Upgrade Strategy
Both the online and offline upgrade processes have trade-offs. The following table illustrates some important aspects of the two upgrade strategies.
|Feature||Online upgrade||Offline upgrade|
|Applications remain available||Yes||No|
|Cluster stays in operation||Yes||No|
|Cluster must be shut down||No||Yes|
|Typically required steps||Rebalance, upgrade, rebalance per node||All cluster nodes are upgraded at once.|