Airbyte is an open-source data integration platform that enables you to move data between various sources and destinations. With Airbyte's Couchbase connectors, you can use Couchbase as both a data source and destination, enabling powerful data integration scenarios including:
Note: Airbyte is designed for batch/periodic data synchronization, not sub-second real-time change tracking. Sync intervals vary based on your Airbyte deployment type and plan. For true real-time CDC, consider Couchbase's built-in XDCR or Eventing services.
This tutorial will guide you through setting up Airbyte with Couchbase Capella (cloud-hosted) as both source and destination, covering configuration, sync modes, common patterns, and best practices.
Couchbase Source Connector:
last_modified xattrCouchbase Destination Connector:
Before starting this tutorial, ensure you have:
You'll need access to one of the following:
This tutorial assumes you have:
The Couchbase source connector allows Airbyte to extract data from your Couchbase buckets. It automatically discovers all collections within a bucket and creates individual streams for each.
What is a stream? In Airbyte, a stream represents a single data source (in this case, a Couchbase collection) that can be synced to a destination. Each stream has its own schema, sync mode, and cursor configuration. Learn more in Airbyte's documentation.
airbyte_source_user (or your preferred name)couchbases://cb.xxxxxx.cloud.couchbase.com)In Airbyte, go to Sources on the left sidebar.
Click + New Source.
Search for Couchbase and select it.
Fill in the configuration fields:
Couchbase Production.couchbases://cb.xxxxxx.cloud.couchbase.comairbyte_source_user).travel-sample).2025-01-01T00:00:00ZClick Set up source.
Airbyte will test the connection and automatically discover available streams.
After successful connection, Airbyte discovers all collections in your bucket and creates a stream for each.
Stream Naming Convention: bucket.scope.collection
Example streams from a travel-sample bucket:
travel-sample.inventory.airlinetravel-sample.inventory.airporttravel-sample.inventory.hoteltravel-sample.inventory.routeStream Schema: Each stream includes:
{
"_id": "string", // Document key
"_ab_cdc_updated_at": "integer", // Modification timestamp (for incremental sync)
"bucket": { // Bucket name
// Original document fields
}
}The Couchbase source connector supports two sync modes:
Syncs all documents from the collection every time.
When to use:
Performance note: Transfers all data on each sync, regardless of changes
Syncs only new or modified documents since the last sync.
How it works:
_ab_cdc_updated_at cursor fieldWhen to use:
Requirements:
last_modified xattr on all documentsThe Couchbase destination connector allows Airbyte to load data into your Couchbase buckets from various sources.
airbyte_dest_user (or your preferred name)Note: These Database Access credentials are used for cluster connections via the SDK, distinct from Capella API credentials which would be used for Capella management operations.
If using the same cluster as your source, network access is already configured. Otherwise, follow the same IP allowlisting steps from Part 1.
Couchbase Destination).couchbases://cb.xxxxxx.cloud.couchbase.comairbyte_dest_user).staging)._default), for example:
_defaultAirbyte will test the connection by creating a temporary collection and performing a test write.
The Couchbase destination connector supports three sync modes:
Clears the destination collection before each sync and replaces with new data.
How it works:
DELETE FROM bucket.scope.collectionDocument ID format:
{stream_name}::{primary_key_value}{stream_name}::{uuid4()}When to use:
Warning: All existing data in the collection is deleted on each sync!
Adds all synced records as new documents, never updating existing ones.
How it works:
Document ID format: {stream_name}::{uuid4()}
The
stream_name::prefix ensures document ID uniqueness when multiple Airbyte streams write to the same Couchbase collection, preventing ID collisions between different source streams.Example: Without prefix, two streams with
id=123would both create document123(collision). With prefix:streamA::123andstreamB::123remain separate.
When to use:
Note: This mode will continuously grow your collection with every sync.
Maintains unique records per primary key, updating existing documents when the same key is synced again.
How it works:
Document ID format: {stream_name}::{primary_key_values_joined}
Example with primary key ["id"]:
id=123 → Document ID: airline::123When to use:
Requirements: Primary key must be configured in the connection settings.
All documents written to Couchbase by Airbyte follow this structure:
{
"id": "stream_name::key_value",
"type": "airbyte_record",
"stream": "source_stream_name",
"_airbyte_extracted_at": 1642526400000,
"data": {
// Original record data from source
},
"_ab_sync_mode": "append_dedup",
"namespace": "optional_namespace"
}Fields Explained:
id: Composite document ID (based on sync mode and primary key)type: Always "airbyte_record"stream: Name of the source stream_airbyte_extracted_at: Unix timestamp (milliseconds) when Airbyte extracted the record from sourcedata: The actual record data from the source_ab_sync_mode: Which sync mode was usednamespace: Optional logical groupingAutomatic Collection Creation: If a collection doesn't exist, the connector creates it automatically.
Collection Naming: Stream names are sanitized:
Now that you've configured both source and destination, let's create a connection to sync data.
Airbyte displays all discovered streams from your source. For each stream:
For each enabled stream, select the appropriate sync mode combination:
Sync Mode Matrix:
| Source Mode | Destination Mode | Result | Use Case |
|---|---|---|---|
| Full Refresh | Overwrite | Complete replacement each sync | Mirror source exactly |
| Full Refresh | Append | Multiple complete snapshots | Historical snapshots |
| Incremental | Append | All changes tracked | Complete audit trail |
| Incremental | Append Dedup | Current state maintained | Live replica |
For incremental syncs:
_ab_cdc_updated_atPrimary keys are required for "Append Dedup" destination mode:
Single Field Primary Key:
[["_id"]]Composite Primary Key (multiple fields):
[["country"], ["city"]]For Couchbase → Couchbase:
_id as primary key to maintain document key consistencyExample Configuration:
travel-sample.inventory.airline_ab_cdc_updated_at[["_id"]]Choose when syncs should run:
Custom cron expression:
0 */4 * * * # Every 4 hours
0 2 * * * # Daily at 2 AM
0 0 * * 0 # Weekly on SundayPredefined intervals:
Connection Name: Give your connection a descriptive name
Production to Staging - Incremental SyncNamespace Configuration:
Namespace Example:
source_namespace: inventory
destination: staging bucket, inventory scopeNormalization: Not supported for Couchbase destination
Data Transformation: dbt transformations not available for Couchbase destination
What Happens During First Sync:
Watch the sync job status:
Sync Statistics:
After the sync completes:
Use Airbyte to replicate data between different Couchbase buckets within the same cluster or across different clusters. This pattern is useful for creating staging environments, maintaining backup copies, or distributing data across geographic regions. Configure the source connector to read from one bucket and the destination connector to write to another bucket using appropriate sync modes based on your requirements.
Use Airbyte to consolidate data from multiple external sources into Couchbase. By centralizing data in Couchbase, you benefit from its flexible JSON document model, powerful N1QL query capabilities, and built-in full-text search. This pattern is ideal for building unified data platforms where Couchbase serves as your operational data store.
Use Airbyte to share Couchbase data with downstream systems for specialized workloads like reporting or archival. While Couchbase can handle most operational and analytical needs directly, this pattern enables integration with legacy systems or specialized tools that require data feeds from your Couchbase cluster.
Use Incremental When:
Use Full Refresh When:
For the initial incremental sync, use start_date to limit the data window:
{
"start_date": "2025-01-01T00:00:00Z"
}Benefits:
For large initial syncs, pre-create collections to avoid collection creation overhead during sync.
Off-Peak Scheduling: Schedule large syncs during low-traffic periods.
# Daily full refresh at 2 AM
0 2 * * *
# Incremental sync every 30 minutes during business hours
*/30 9-17 * * MON-FRIParallel Syncs: Airbyte can sync multiple streams in parallel. Monitor cluster resources:
Always use couchbases:// for production:
For Airbyte Cloud:
For Self-Hosted Airbyte:
The connector uses configurable timeout settings for key-value and query operations.
If you experience timeouts, consider:
Replicas: Consider replica count for destination buckets
Compression: Enable compression for large documents
Track these Capella metrics during syncs:
Set up alerts for high CPU, memory, and disk queue thresholds based on your cluster's normal operating levels.
For Capella:
Append vs Append Dedup:
Optimize Sync Frequency:
Database Users:
Permissions:
Source User: Read access
Destination User: Read/Write accessIP Allowlisting:
Connection String:
couchbases:// (TLS encrypted)couchbase:// in productionFor Sensitive Data:
Compliance Considerations:
Maintain consistent field types across documents:
// Good - consistent types
{"order_id": "12345", "total": 99.99}
{"order_id": "12346", "total": 149.99}
// Bad - inconsistent types
{"order_id": "12345", "total": 99.99}
{"order_id": 12346, "total": "149.99"}Good Primary Keys:
Examples:
user_idorder_id[["user_id"], ["event_timestamp"]]Avoid:
The destination connector converts nulls to empty strings.
Use ISO 8601 format for all timestamps:
{
"created_at": "2025-01-15T14:30:00Z",
"updated_at": "2025-01-20T09:15:30Z"
}Benefits:
Monitor sync health in the Airbyte UI:
Connection Status:
Key Metrics to Track:
During Active Syncs, Monitor:
Cluster Metrics (Settings → Metrics):
Bucket Statistics:
Query Performance (Query → Workbench): Monitor query execution times during sync periods
"Connection check failed": Verify IP is allowlisted in Capella, connection string format is correct (couchbases://cb.xxxxx.cloud.couchbase.com), and credentials are valid.
"Network timeout": Check firewall rules and ensure Airbyte can reach Capella on port 11207.
"No streams discovered": Ensure bucket has collections with documents and user has read permissions.
"Incremental sync not detecting changes": Reset connection state in Airbyte (Connection Settings → Advanced → Reset Data).
"Collection creation failed": Verify user has Read/Write access and scope exists.
"Batch write timeout": Scale up Capella cluster or reduce sync frequency.
"Syncs are very slow": Switch to Incremental mode, disable unused streams, and ensure Airbyte and Capella are in the same region.
In Airbyte:
Common Log Patterns:
ERROR - Failed to connect: timeout
→ Network/connectivity issue
ERROR - Schema validation failed
→ Data type mismatch
WARN - Retrying batch write (attempt 2/3)
→ Temporary issue, may resolve itselfUse Couchbase SDK or cbshell to test connection independently:
# Using cbshell
cbshell -c couchbases://cb.xxxxx.cloud.couchbase.com \
-u username -p passwordIf this fails, the issue is with Couchbase access, not Airbyte.
After a sync, spot-check data in Couchbase via the Capella console Documents browser.
If incremental sync is stuck or behaving incorrectly:
Warning: This clears sync state and forces a complete re-sync.
Query system tables for diagnostic info via the Capella Query Workbench to check active queries during sync operations.
Airbyte Community:
connector: source-couchbase or connector: destination-couchbaseCouchbase Community: