Monitoring XDCR Timestamp-based Conflict Resolution

Monitoring XDCR Timestamp-based Conflict Resolution

Monitoring Time Drift Using Statistics

It is very important to actively monitor various aspects related to timestamp-based conflict resolution.

Various aspects of time based conflict resolution can be monitored through the Web Console and using the cbstats command.

The primary monitoring feature on the Web Console -> Data Buckets page are graphs which display an average time drift for the cluster nodes. This data is displayed in two graphs, one showing the drift for replica vBuckets and the other shows drift for the active vBuckets. A single cluster calculates drift during document replication, each replica vBucket calculates drift as it receives data from its active peer. When a cluster is the destination for XDCR traffic, active vBuckets will calculate drift from their remote cluster peers.

It is normal for a cluster with closely synchronized clocks to show some drift; in general it will be showing how long it took a mutation to be replicated and should remain steady. It is also normal for the active vBucket drift to be zero if no XDCR relationship exists (or if no XDCR traffic is flowing).

The Web Console -> Data Buckets page graphs the following statistics related to XDCR and timestamp-based conflict resolution:
  • Incoming XDCR ops/sec.
  • Average active vBucket drift per mutation in seconds - Graph shows the bucket’s ep_active_hlc_drift statistic.
  • Average replica vBucket drift per mutation in seconds - Graph shows the bucket’s ep_replica_hlc_drift statistic
  • Active vBucket ahead exceptions - Graph shows how many mutations have a drift greater than the vBucket’s drift ahead threshold.
  • Active vBucket behind exceptions - Graph shows how many mutations have a negative drift less than the vBucket’s drift behind threshold.

For more detailed information and to view the statistics used by the Web Console, many more statistics can be viewed using the cbstats utility.

The most detailed information is stored per vBucket and accessed via cbstats vbucket-details.
  • max_cas - the vBucket’s current maximum hybrid logical clock timestamp. In general, this statistic shows the value issued to the last mutation or in certain cases the largest timestamp the vBucket has received (when the received timestamp is ahead of the local clock).
  • max_cas_str - is the max_cas displayed as a human readable ISO-8601 timestamp (UTC).
  • total_abs_drift - “Total Absolute Drift” is the accumulated drift observed by the vBucket. Drift is always accumulated as an absolute value.
  • total_abs_drift_count - how many updates have been applied to total_abs_drift, for the purpose of average or rate calculations.
  • drift_ahead_threshold - the threshold at which positive drift will trigger an update to drift_ahead_exceeded. The value is displayed in nanoseconds.
  • drift_behind_threshold - the threshold at which positive drift will trigger an update to drift_behind_exceeded. The value is displayed in nanoseconds as a positive value, but is converted to a negative value for actual exception checks.
  • drift_ahead_threshold_exceeded - How many mutations have been observed with a drift above the drift_ahead_threshold.
  • drift_behind_threshold_exceeded - How many mutations have been observed with a drift below the drift_behind_threshold.
  • logical_clock_ticks - Stores how many times the hybrid logical clock has had to increment the logical clock.
Some of the detailed vBucket statistics are summarized in the bucket statistics, accessed using cbstats all.
  • ep_active_hlc_drift - The sum of total_abs_drift for the node’s active vBuckets.
  • ep_active_hlc_drift_count - The sum of total_abs_drift_count for the node’s active vBuckets.
  • ep_replica_hlc_drift - The sum of total_abs_drift for the node’s active vBuckets.
  • ep_replica_hlc_drift_count - The sum of total_abs_drift_count for the node’s active vBuckets.
  • ep_active_ahead_exceptions - The sum of drift_ahead_exceeded for the node’s active vBuckets.
  • ep_active_behind_exceptions - The sum of drift_behind_exceeded for the node’s active vBuckets.
  • ep_replica_ahead_exceptions - The sum of drift_ahead_exceeded for the node’s replica vBuckets.
  • ep_replica_behind_exceptions - The sum of drift_behind_exceeded for the node’s replica vBuckets.

What Happens When Your Physical Clocks Are No Longer in Sync

If your physical clocks are drifting ahead and the drift is larger than the threshold of 5 seconds (5000 milliseconds), then an alert is raised on the destination cluster with the following message: “ [<DATE>] - Remote or replica mutation received for bucket "<BUCKET>" on node "<IP>" with timestamp more than 5000 milliseconds ahead of local clock. Please ensure that NTP is set up correctly on all nodes across the replication topology and clocks are synchronized.

Important: If you see this alert then contact the Couchbase Technical Support team immediately.