Recover Partitions from a Remote Cluster

Recover Partitions from a Remote Cluster

Data recovery from remote clusters requires XDCR and enough memory and disk space.

If more nodes in a cluster catastrophically fail than the number of replicas, data partitions in that cluster will no longer be available. For example, if you have a four-node cluster with one replica and two nodes fail and are unrecoverable, some data shards will no longer be available.

Data recovery from remote clusters requires an XDCR environment and adequate amount of memory and disk space to support the workload and recovered data.

To recover the missing data partition in case of node failure and when vBuckets are unavailable, follow these steps.

  1. For each failed node, click on Server Nodes > Fail Over.

    You will see whether data is unavailable and which vBuckets are unavailable. If you do not have enough replicas for the number of the failed-over nodes, some vBuckets will no longer be available:

  2. Add new functioning nodes to replace the failed nodes.

    Do not rebalance after you add new nodes to a cluster. Typically, you will rebalance after adding nodes to the cluster, but in this scenario the rebalance will destroy the information about the missing vBuckets and you cannot recover it.

    In this example, two nodes failed in a three-node cluster and a new node at 10.3.3.61 was added. If you are certain that your cluster can easily handle the workload and recovered data, you may choose to skip this step.

  3. Run cbrecovery to recover data from your backup cluster. In the Server Panel, the Stop Recovery button appears.

    After the recovery completes, this button disappears.

  4. Rebalance your cluster.

    Once the recovery is completed, you can rebalance your cluster. Rebalancing will recreate replica vBuckets and redistribute them evenly across the cluster.

Recovery Dry-Run

Preview a list of buckets that are no longer available in the cluster.

Before you recover vBuckets, you may want to preview a list of buckets that are no longer available in the cluster. Use this command and options:

shell> ./cbrecovery http://Administrator:password@10.3.3.72:8091 http://Administrator:password@10.3.3.61:8091 -n

The administrative credentials are provided for the node in the cluster, as well as the option -n. The command returns a list of vBuckets in the remote secondary cluster that are no longer in the first cluster. If there are any unavailable buckets in the cluster with failed nodes, you will see output as follows:

2013-04-29 18:16:54,384: MainThread Missing vbuckets to be recovered:[{"node": "ns_1@10.3.3.61",
        "vbuckets": [513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526,, 528, 529,
        530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,, 547, 548,
        549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567,
        568, 569, 570, 571, 572,....

In this case, the vbuckets array contains all the vBuckets that are no longer available in the cluster. These are the buckets you can recover from the remotes cluster using the following command:

shell> ./cbrecovery http://Administrator:password@<From_IP>:8091 \
        http://Administrator:password@<To_IP>:8091 -B bucket_name

You can run the command either on the cluster with unavailable vBuckets, or on the remote cluster.

Provide the hostname, port, and credentials for the remote cluster and the cluster with missing vBuckets, in that order. If you do not provide the parameter -B, the tool assumes you will recover unavailable vBuckets for the default bucket.

Monitor the Recovery Process

Full, Cluster, Read-only, and Replication Administrators can use the Couchbase Web Console to monitor the recovery process.

To monitor the progress of recovery:

  1. Click on the Data Buckets tab.
  2. Select the data bucket you are recovering from the Data Buckets drop-down list.
  3. Click on the Summary drop-down list to see more details about this data bucket. You will see an increased number of items during recovery:

  4. You can also see the number of active vBuckets increase as they are recovered until you reach 1024 vBuckets.

    Click on the vBucket Resources drop-down:

    Since this tool runs from the command line, you can stop it at any time.

  5. The Stop Recovery button appears in the Servers panels. If you click this button, you will stop the recovery process between clusters. Once the recovery process completes, this button will no longer appear, and you will need to rebalance the cluster. You can also stop it in this panel:

  6. After the recovery completes, click on the Server Nodes tab and then on Rebalance to rebalance your cluster. When cbrecovery finishes, it will output a report in the console:
      Recovery :                Total |    Per sec
                batch    :                 0000 |       14.5
                byte     :                 0000 |      156.0
                msg      :                 0000 |       15.6
                4 vbuckets recovered with elapsed time 10.90 seconds

In this report: batch is a group of internal operations performed by cbrecovery, byte indicates the total number of bytes recovered, and msg is the number of documents recovered.