Proactive monitoring and alerting is essential to managing a healthy Couchbase environment. While the Couchbase Web Console provides detailed statistics and basic alerting functionality, it is not intended to be a realtime dashboard and shouldn't be used as the primary operational monitoring utility.
Integration with external monitoring systems is required for two primary purposes: proactive alerting and high resolution trending. The external monitoring system should be capable of setting alert thresholds on a per-metric basis. As the value of most metrics are workload and environment-specific, they will require establishing a baseline for what is "normal" for your use cases. Trending the Couchbase metrics will help establish the baseline values and alerts can be configured when point-in-time values exceed the "normal" range. Trended metrics also allows Couchbase administrators to observe resource consumption over time, informing when scaling events will become necessary.
This learning path describes how to poll the Couchbase REST API to obtain metrics for an external monitoring system, describes which metrics are most important to monitor, and provides guidance on how to interpret those metrics.
Couchbase exposes monitoring metrics via REST APIs with responses returned in JSON format. There are two types of statistical APIs available, Cluster Manager (port 8091/18091) stats and Service specific administrative stats.
Cluster Manager stats provide statistical sampling for a given service and/or entities at a particular interval. Each response from
/stats endpoint will contain a
timestamp property for when the sample was taken that will directly correlate to each of the available stats.
Every Cluster Manager endpoint supports two optional query string parameters:
zoom parameter determines the interval of samples to return in the response. The zoom parameter provides the following granularity:
zoom=minute(default) - Every second for the last minute (60 samples)
zoom=hour- Every four (4) seconds for the last hour (900 samples)
zoom=day- Every minute for the last day (1440 samples)
zoom=week- Every ten (10) minutes for the last week, actually, eight (8) days (1152 samples)
zoom=year- Every six (6) hours for the last year (1464 samples)
Due to sample frequency, the number of samples returned are plus or minus one (+-1).
Requests statistics from this timestamp until the current time. The
haveTStamp parameter is specified as UNIX epoch time in milliseconds.
To limit the results when using the zoom parameter, post-process the results. For example, if you need samples from the last five (5) minutes, set the zoom parameter to one hour and retrieve the last 75 entries from the JSON list.
The REST APIs should be polled minutely via a local agent or remotely using the node(s) IP or hostname. Couchbase REST APIs must be accessed using administrative account credentials; a Read-Only Administrator is recommended for this purpose.
As most of the metrics provided by the REST API are per-node, it is necessary to query every node in the cluster.
Limit the number of requests per API when querying metrics, i.e. return all bucket metrics in one request rather than issuing separate requests per metric. Heavy use of the Couchbase REST APIs can have CPU utilization impacts on the cluster.
Some monitoring systems are capable of discovering new monitoring targets and automatically defining the monitoring profile to be applied. Couchbase supports this by exposing cluster membership, MDS service assignment, and service ports via the Data Service Node API.
Each section in the list describe the available monitoring metrics exposed by the Couchbase service, a description of each metric, and possible operational responses. Alerts should be configured to be sent from the external monitoring system when metric values fall outside the expected range. Guidance on interpreting the metrics and possible operational responses is provided.
Each guide will contain examples of how to call an endpoint and parse the results. For these examples a tool called
jq is used, it is a lightweight cli parser for JSON, this is not required and is provided for example purposes only. It can be downloaded here.
Couchbase provides a reference monitoring implementation to demonstrate interacting with the available REST APIs.
The following monitoring systems have plugins available for Couchbase. Note that these are third party integrations and may not be complete nor follow the best practices set forth in this document.