Couchbase provides several built-in alerts for when Couchbase is approaching a critical failure or when a critical failure has occurred. It is recommended to enable the built-in email alerts and configure them to be sent to multiple recipients or a distribution list. These alerts should be treated as a fail-safe to proactive alerting from an external monitoring service.
Some environments do not permit Couchbase nodes to send email. This table provides the log-based equivalent of the built-in Couchbase email alerts.
Logs can be monitored via REST using the https://<server>:8091/logs
endpoint or via the /opt/couchbase/var/lib/couchbase/logs/info.log
file. Alerts can be generated by applying a regular expression to match either the module/code combination or string noted below.
Alert | Description | Code |
---|---|---|
Node was auto-failed-over | The sending node has been failed over automatically. | auto_failover_node |
Maximum number of auto-failed-over nodes was reached | The auto-failover system stops auto-failover when the maximum number of spare nodes available has been reached. | auto_failover_maximum_reached |
Node wasn't auto-failed-over as other nodes are down at the same time | Auto-failover does not take place if there is already a node down. | auto_failover_other_nodes_down |
Node was not auto-failed-over as there are not enough nodes in the cluster running the same service | You cannot support auto-failover with less than three nodes. | auto_failover_cluster_too_small |
Node was not auto-failed-over as auto-failover for one or more services running on the node is disabled | Auto-failover does not take place on a node as one or more services running on the node is disabled. | auto_failover_disabled |
Node's IP address has changed unexpectedly | The IP address of the node has changed, which may indicate a network interface, operating system, or other network or system failure. | ip |
Disk space used for persistent storage has reach at least 90% of capacity | The disk device configured for storage of persistent data is nearing full capacity. | disk |
Metadata overhead is more than 50% | The amount of data required to store the metadata information for your dataset is now greater than 50% of the available RAM. | overhead |
Bucket memory on a node is entirely used for metadata | All the available RAM on a node is being used to store the metadata for the objects stored. This means that there is no memory available for caching values. With no memory left for storing metadata, further requests to store data will also fail. Only applicable to buckets configured for value-only ejection. |
ep_oom_errors |
Writing data to disk for a specific bucket has failed | The disk or device used for persisting data has failed to store persistent data for a bucket. | ep_item_commit_failed |
Writing event to audit log has failed | The audit log event writing has failed. | audit_dropped_events |
Approaching full Indexer RAM warning | The indexer RAM limit threshold is approaching warning. | indexer_ram_max_usage |
Remote mutation timestamp exceeded drift threshold | The remote mutation timestamp exceeded drift threshold warning. | ep_clock_cas_drift_threshold_exceeded |
Communication issues among some nodes in the cluster | There are some communication issues in some nodes within the cluster. | communication_issue |
The same log file messages that are available in the Admin UI http://localhost:8091/ui/index.html#!/logs are available via a REST API as well.
The Logs API supports the following query string parameters
Param | Description |
---|---|
limit | An integer greater than 0 that limits the overall number of messages returned |
sinceTime | Epoch timestamp in milliseconds to start returning messages from |
Property | Description |
---|---|
code | A code specified by the module or 0 |
module | The module that generated the log message |
node | The node that the message came from |
serverTime | An ISO-8601 timestamp of when the message was logged |
shortText | A short string describing the log entry, most commonly "message", "node up", or "node down" |
text | The detailed log message |
tstamp | An Epoch timestamp of when the message was logged |
type | The type of log message, values can be: info, warning, critical |
curl \
--user Administrator:password \
--silent \
--request GET \
--data limit=100 \
http://localhost:8091/logs | \
jq -r '.list[] |
"[" + .type + "] " + .serverTime +
" Module: " + .module +
" Code: " + (.code | tostring) +
" Message: " + .text
'
curl \
--user Administrator:password \
--silent \
--request GET \
--data limit=100 \
http://localhost:8091/logs | \
jq -r '.list[] | select(.type == "critical") |
"[" + .type + "] " + .serverTime +
" Module: " + .module +
" Code: " + (.code | tostring) +
" Message: " + .text
'
curl \
--user Administrator:password \
--silent \
--request GET \
--data limit=100 \
http://localhost:8091/logs | \
jq -r '.list[] | select(.type == "warning") |
"[" + .type + "] " + .serverTime +
" Module: " + .module +
" Code: " + (.code | tostring) +
" Message: " + .text
'
curl \
--user Administrator:password \
--silent \
--request GET \
--data limit=100 \
http://localhost:8091/logs | \
jq -r '.list[] | select(.type == "critical" or .type == "warning") |
"[" + .type + "] " + .serverTime +
" Module: " + .module +
" Code: " + (.code | tostring) +
" Message: " + .text
'
Critical alerts that trigger email alerts, are also displayed to users in the Admin UI upon logging in. These alerts can optionally be monitored, should email not be an option.
Alerts are located at the root of the response payload in a property "alerts"
, which is an array.
Property | Description |
---|---|
msg | The alert message and details |
serverTime | The time the alert was issued |
curl \
--user Administrator:password \
--silent \
--request GET \
http://localhost:8091/pools/default | \
jq -r '.alerts[] | .serverTime + " - " + .msg'