Monitoring Logs on Couchbase Server

Learn about built-in email alerts and as how to monitor them via the REST API when email isn't an option
Explore the Couchbase Logs API and see parameters and properties available
View several example requests that show how to fetch log messages with varying degrees of severity

Built-in Email Alerts and Logs

Couchbase provides several built-in alerts for when Couchbase is approaching a critical failure or when a critical failure has occurred. It is recommended to enable the built-in email alerts and configure them to be sent to multiple recipients or a distribution list. These alerts should be treated as a fail-safe to proactive alerting from an external monitoring service.

Some environments do not permit Couchbase nodes to send email. This table provides the log-based equivalent of the built-in Couchbase email alerts.

Logs can be monitored via REST using the https://<server>:8091/logs endpoint or via the /opt/couchbase/var/lib/couchbase/logs/info.log file. Alerts can be generated by applying a regular expression to match either the module/code combination or string noted below.

Available Alerts

Alert	Description	Code
Node was auto-failed-over	The sending node has been failed over automatically.	auto_failover_node
Maximum number of auto-failed-over nodes was reached	The auto-failover system stops auto-failover when the maximum number of spare nodes available has been reached.	auto_failover_maximum_reached
Node wasn't auto-failed-over as other nodes are down at the same time	Auto-failover does not take place if there is already a node down.	auto_failover_other_nodes_down
Node was not auto-failed-over as there are not enough nodes in the cluster running the same service	You cannot support auto-failover with less than three nodes.	auto_failover_cluster_too_small
Node was not auto-failed-over as auto-failover for one or more services running on the node is disabled	Auto-failover does not take place on a node as one or more services running on the node is disabled.	auto_failover_disabled
Node's IP address has changed unexpectedly	The IP address of the node has changed, which may indicate a network interface, operating system, or other network or system failure.	ip
Disk space used for persistent storage has reach at least 90% of capacity	The disk device configured for storage of persistent data is nearing full capacity.	disk
Metadata overhead is more than 50%	The amount of data required to store the metadata information for your dataset is now greater than 50% of the available RAM.	overhead
Bucket memory on a node is entirely used for metadata	All the available RAM on a node is being used to store the metadata for the objects stored. This means that there is no memory available for caching values. With no memory left for storing metadata, further requests to store data will also fail. Only applicable to buckets configured for value-only ejection.	ep_oom_errors
Writing data to disk for a specific bucket has failed	The disk or device used for persisting data has failed to store persistent data for a bucket.	ep_item_commit_failed
Writing event to audit log has failed	The audit log event writing has failed.	audit_dropped_events
Approaching full Indexer RAM warning	The indexer RAM limit threshold is approaching warning.	indexer_ram_max_usage
Remote mutation timestamp exceeded drift threshold	The remote mutation timestamp exceeded drift threshold warning.	ep_clock_cas_drift_threshold_exceeded
Communication issues among some nodes in the cluster	There are some communication issues in some nodes within the cluster.	communication_issue

Logs API

The same log file messages that are available in the Admin UI http://localhost:8091/ui/index.html#!/logs are available via a REST API as well.

Insecure: http://localhost:8091/logs
Secure: https://localhost:18091/logs

API Parameters

The Logs API supports the following query string parameters

Param	Description
limit	An integer greater than 0 that limits the overall number of messages returned
sinceTime	Epoch timestamp in milliseconds to start returning messages from

Log Response Properties

Property	Description
code	A code specified by the module or 0
module	The module that generated the log message
node	The node that the message came from
serverTime	An ISO-8601 timestamp of when the message was logged
shortText	A short string describing the log entry, most commonly "message", "node up", or "node down"
text	The detailed log message
tstamp	An Epoch timestamp of when the message was logged
type	The type of log message, values can be: info, warning, critical

Example: All Log Messages

curl \
  --user Administrator:password \
  --silent \
  --request GET \
  --data limit=100 \
  http://localhost:8091/logs | \
  jq -r '.list[] |
  "[" + .type + "] " + .serverTime +
  " Module: " + .module +
  " Code: " + (.code | tostring) +
  " Message: " + .text
  '

Example: Critical Messages Only

curl \
  --user Administrator:password \
  --silent \
  --request GET \
  --data limit=100 \
  http://localhost:8091/logs | \
  jq -r '.list[] | select(.type == "critical") |
  "[" + .type + "] " + .serverTime +
  " Module: " + .module +
  " Code: " + (.code | tostring) +
  " Message: " + .text
  '

Example: Warning Messages Only

curl \
  --user Administrator:password \
  --silent \
  --request GET \
  --data limit=100 \
  http://localhost:8091/logs | \
  jq -r '.list[] | select(.type == "warning") |
  "[" + .type + "] " + .serverTime +
  " Module: " + .module +
  " Code: " + (.code | tostring) +
  " Message: " + .text
  '

Example: Critical or Warning Messages Only

curl \
  --user Administrator:password \
  --silent \
  --request GET \
  --data limit=100 \
  http://localhost:8091/logs | \
  jq -r '.list[] | select(.type == "critical" or .type == "warning") |
  "[" + .type + "] " + .serverTime +
  " Module: " + .module +
  " Code: " + (.code | tostring) +
  " Message: " + .text
  '

Alerts API

Critical alerts that trigger email alerts, are also displayed to users in the Admin UI upon logging in. These alerts can optionally be monitored, should email not be an option.

Insecure: http://localhost:8091/pools/default
Secure: https://localhost:18091/pools/default

Alerts are located at the root of the response payload in a property "alerts", which is an array.

Alert Properties

Property	Description
msg	The alert message and details
serverTime	The time the alert was issued

Example: Retrieve All Alerts

curl \
  --user Administrator:password \
  --silent \
  --request GET \
  http://localhost:8091/pools/default | \
  jq -r '.alerts[] | .serverTime + " - " + .msg'

Contents