Using Airbyte with Couchbase as Source and Destination

Learn to configure Airbyte with Couchbase as both source and destination
Understand sync modes and data replication strategies
Set up bidirectional data pipelines with Couchbase

Overview

Airbyte is an open-source data integration platform that enables you to move data between various sources and destinations. With Airbyte's Couchbase connectors, you can use Couchbase as both a data source and destination, enabling powerful data integration scenarios including:

Cross-bucket replication: Sync data between buckets within the same or different Couchbase clusters
Data ingestion: Load data from SaaS applications, databases, or APIs into Couchbase
Change data capture: Track and replicate document changes with periodic syncs

Note: Airbyte is designed for batch/periodic data synchronization, not sub-second real-time change tracking. Sync intervals vary based on your Airbyte deployment type and plan. For true real-time CDC, consider Couchbase's built-in XDCR or Eventing services.

This tutorial will guide you through setting up Airbyte with Couchbase Capella (cloud-hosted) as both source and destination, covering configuration, sync modes, common patterns, and best practices.

What You'll Learn

How to configure Couchbase as an Airbyte source
How to configure Couchbase as an Airbyte destination
Understanding different sync modes and when to use them
Creating and managing data pipelines
Performance optimization and troubleshooting

About the Connectors

Couchbase Source Connector:

Version: 0.1.8 (Community support)
Minimum Couchbase version: 7.0+
Supported sync modes: Full Refresh, Incremental (Append)
Change tracking: Based on document last_modified xattr

Couchbase Destination Connector:

Version: 0.1.9 (Community support)
Minimum Couchbase version: 7.0+
Supported sync modes: Overwrite, Append, Append Dedup
Document handling: Automatic collection creation

Prerequisites

Before starting this tutorial, ensure you have:

Couchbase Capella Setup

Active Capella Account: Sign up at cloud.couchbase.com if you don't have an account
Running Cluster: At least one operational cluster
Bucket Created: One or more buckets with scopes and collections
Database User: Users with appropriate permissions:
- For Source: Read access
- For Destination: Read/Write access
Network Access: IP allowlist configured (we'll cover this in setup)

Airbyte Setup

You'll need access to one of the following:

Airbyte Cloud: Managed service at cloud.airbyte.com (recommended for beginners)
Airbyte Open Source: Self-hosted instance following the Airbyte deployment guide

Network Requirements

Network connectivity between Airbyte and your Couchbase cluster
For Capella: Airbyte's IP address(es) added to the allowlist
Ports: 11207 (for couchbases://) or 11210 (for couchbase://)

Knowledge Prerequisites

This tutorial assumes you have:

Basic familiarity with Couchbase concepts (buckets, scopes, collections)
Understanding of JSON data structures
Basic knowledge of data synchronization concepts
Access to the Airbyte user interface

Part 1: Configuring Couchbase as a Source

The Couchbase source connector allows Airbyte to extract data from your Couchbase buckets. It automatically discovers all collections within a bucket and creates individual streams for each.

What is a stream? In Airbyte, a stream represents a single data source (in this case, a Couchbase collection) that can be synced to a destination. Each stream has its own schema, sync mode, and cursor configuration. Learn more in Airbyte's documentation.

Step 1: Prepare Your Couchbase Source

Create a Database User

Log in to your Couchbase Capella console
Navigate to Settings → Database Access
Click Create Database Access
Configure the user:
- Username: airbyte_source_user (or your preferred name)
- Password: Generate a secure password
- Bucket Access: Select the source bucket
- Access: Select "Read" access for the bucket
Save the credentials securely

Configure Network Access

In Capella, go to Settings → Allowed IP Addresses
Click Add Allowed IP
Add your Airbyte instance IP address:
- For Airbyte Cloud: Check Airbyte's IP addresses documentation
- For self-hosted: Add your Airbyte server's public IP
Save the configuration

Get Your Connection String

In Capella, navigate to your cluster
Click Connect
Copy the connection string (format: couchbases://cb.xxxxxx.cloud.couchbase.com)
Keep this handy for the next step

Step 2: Configure the Source in Airbyte

In Airbyte, go to Sources on the left sidebar.
Click + New Source.
Search for Couchbase and select it.
Fill in the configuration fields:
- Source Name: Give your source a descriptive name, e.g. Couchbase Production.
- Connection String: Enter your Capella connection string, e.g.
```
couchbases://cb.xxxxxx.cloud.couchbase.com
```
- Username: Enter your database username (e.g., airbyte_source_user).
- Password: Enter the corresponding password.
- Bucket: Specify the source bucket name (e.g., travel-sample).
- Start Date (Optional): For incremental syncs, provide a UTC date in ISO 8601 format (e.g.,
```
2025-01-01T00:00:00Z
```
  )
  - Only documents modified after this date will be included in the first sync.
  - Leave empty to sync all documents.
Click Set up source.

Airbyte will test the connection and automatically discover available streams.

Step 3: Understanding Stream Discovery

After successful connection, Airbyte discovers all collections in your bucket and creates a stream for each.

Stream Naming Convention: bucket.scope.collection

Example streams from a travel-sample bucket:

travel-sample.inventory.airline
travel-sample.inventory.airport
travel-sample.inventory.hotel
travel-sample.inventory.route

Stream Schema: Each stream includes:

{
  "_id": "string",              // Document key
  "_ab_cdc_updated_at": "integer",  // Modification timestamp (for incremental sync)
  "bucket": {                   // Bucket name
    // Original document fields
  }
}

Understanding Source Sync Modes

The Couchbase source connector supports two sync modes:

Full Refresh

Syncs all documents from the collection every time.

When to use:

Small collections that change frequently
When you need a complete snapshot every time
When incremental tracking is not required

Performance note: Transfers all data on each sync, regardless of changes

Incremental (Append)

Syncs only new or modified documents since the last sync.

How it works:

Uses the _ab_cdc_updated_at cursor field
Tracks the last synced timestamp
Queries only documents modified after that timestamp

When to use:

Large collections
When you want to reduce data transfer
When tracking changes over time

Requirements:

Couchbase automatically maintains the last_modified xattr on all documents

Part 2: Configuring Couchbase as a Destination

The Couchbase destination connector allows Airbyte to load data into your Couchbase buckets from various sources.

Step 1: Prepare Your Couchbase Destination

Create a Database User

In Couchbase Capella, navigate to Settings → Database Access
Click Create Database Access
Configure the user:
- Username: airbyte_dest_user (or your preferred name)
- Password: Generate a secure password
- Bucket Access: Select the destination bucket
- Access: Select "Read/Write" access for the bucket
Save the credentials

Note: These Database Access credentials are used for cluster connections via the SDK, distinct from Capella API credentials which would be used for Capella management operations.

Ensure Network Access

If using the same cluster as your source, network access is already configured. Otherwise, follow the same IP allowlisting steps from Part 1.

Step 2: Configure the Destination in Airbyte

In Airbyte, go to the Destinations tab in the left navigation.
Click + New Destination.
Search for Couchbase and select it from the list.
Complete the destination configuration form:
- Destination Name: Enter a descriptive name (e.g., Couchbase Destination).
- Connection String: Enter your Capella endpoint, for example:
```
couchbases://cb.xxxxxx.cloud.couchbase.com
```
- Username: Enter your destination database username (e.g., airbyte_dest_user).
- Password: Enter your destination database user's password.
- Bucket: Specify the destination bucket name (e.g., staging).
- Scope (Optional): Enter the scope name if needed (defaults to _default), for example:
```
_default
```
Click Set up destination to save and test the connection.

Airbyte will test the connection by creating a temporary collection and performing a test write.

Step 3: Understanding Destination Sync Modes

The Couchbase destination connector supports three sync modes:

Overwrite

Clears the destination collection before each sync and replaces with new data.

How it works:

Deletes all existing documents: DELETE FROM bucket.scope.collection
Inserts new documents from the sync

Document ID format:

If primary key is set: {stream_name}::{primary_key_value}
If no primary key: {stream_name}::{uuid4()}

When to use:

When the destination should mirror the source exactly
For small datasets
When historical data is not needed

Warning: All existing data in the collection is deleted on each sync!

Append

Adds all synced records as new documents, never updating existing ones.

How it works:

Always generates new UUID for each document
Never overwrites existing documents

Document ID format: {stream_name}::{uuid4()}

The stream_name:: prefix ensures document ID uniqueness when multiple Airbyte streams write to the same Couchbase collection, preventing ID collisions between different source streams.

Example: Without prefix, two streams with id=123 would both create document 123 (collision). With prefix: streamA::123 and streamB::123 remain separate.

When to use:

When you need complete history of all syncs
For audit trails
When duplicate records are acceptable

Note: This mode will continuously grow your collection with every sync.

Append Dedup

Maintains unique records per primary key, updating existing documents when the same key is synced again.

How it works:

Uses primary key to generate deterministic document IDs
Upserts documents (insert if new, update if exists)

Document ID format: {stream_name}::{primary_key_values_joined}

Example with primary key ["id"]:

Document with id=123 → Document ID: airline::123

When to use:

When you want the latest version of each record (most common use case)
For maintaining a live replica of source data
When combined with incremental source sync

Requirements: Primary key must be configured in the connection settings.

Understanding Document Structure

All documents written to Couchbase by Airbyte follow this structure:

{
  "id": "stream_name::key_value",
  "type": "airbyte_record",
  "stream": "source_stream_name",
  "_airbyte_extracted_at": 1642526400000,
  "data": {
    // Original record data from source
  },
  "_ab_sync_mode": "append_dedup",
  "namespace": "optional_namespace"
}

Fields Explained:

id: Composite document ID (based on sync mode and primary key)
type: Always "airbyte_record"
stream: Name of the source stream
_airbyte_extracted_at: Unix timestamp (milliseconds) when Airbyte extracted the record from source
data: The actual record data from the source
_ab_sync_mode: Which sync mode was used
namespace: Optional logical grouping

Collection Management

Automatic Collection Creation: If a collection doesn't exist, the connector creates it automatically.

Collection Naming: Stream names are sanitized:

Invalid characters replaced with underscores
Must start with a letter
Maximum 251 characters

Part 3: Creating Connections

Now that you've configured both source and destination, let's create a connection to sync data.

Step 1: Create a New Connection

In Airbyte, click Connections in the left navigation
Click + New Connection
Select Source: Choose your Couchbase source (or any other configured source)
Select Destination: Choose your Couchbase destination (or any other configured destination)
Click Set up connection

Step 2: Configure Streams

Airbyte displays all discovered streams from your source. For each stream:

Enable/Disable Streams

Toggle the checkbox next to each stream
Only enabled streams will be synced
Start with a small test stream to verify configuration

Choose Sync Mode

For each enabled stream, select the appropriate sync mode combination:

Sync Mode Matrix:

Source Mode	Destination Mode	Result	Use Case
Full Refresh	Overwrite	Complete replacement each sync	Mirror source exactly
Full Refresh	Append	Multiple complete snapshots	Historical snapshots
Incremental	Append	All changes tracked	Complete audit trail
Incremental	Append Dedup	Current state maintained	Live replica

Configure Cursor Field

For incremental syncs:

Cursor Field: Automatically set to _ab_cdc_updated_at
This field tracks when documents were last modified
Do not change this unless you have a custom cursor field

Set Primary Key

Primary keys are required for "Append Dedup" destination mode:

Single Field Primary Key:

[["_id"]]

Composite Primary Key (multiple fields):

[["country"], ["city"]]

For Couchbase → Couchbase:

Use _id as primary key to maintain document key consistency
Or use business keys from your data model

Example Configuration:

Stream: travel-sample.inventory.airline
Sync Mode: Incremental | Append Dedup
Cursor: _ab_cdc_updated_at
Primary Key: [["_id"]]

Step 3: Configure Sync Schedule

Choose when syncs should run:

Manual

Syncs only when manually triggered
Good for testing and one-time migrations

Scheduled (Cron)

Custom cron expression:

0 */4 * * *    # Every 4 hours
0 2 * * *      # Daily at 2 AM
0 0 * * 0      # Weekly on Sunday

Basic Schedule

Predefined intervals:

Every hour
Every 6 hours
Every 12 hours
Every 24 hours

Step 4: Advanced Configuration

Connection Name: Give your connection a descriptive name

Production to Staging - Incremental Sync

Namespace Configuration:

Source namespace: Destination uses source-defined namespace
Custom format: Define your own namespace pattern
Destination default: Use destination's default namespace

Namespace Example:

source_namespace: inventory
destination: staging bucket, inventory scope

Normalization: Not supported for Couchbase destination

Data Transformation: dbt transformations not available for Couchbase destination

Step 5: Test and Activate

Review your configuration summary
Click Test connection to verify settings
Once successful, click Set up connection
The connection is now created but not yet run

Step 6: Run Your First Sync

Manual Trigger

Navigate to your connection
Click Sync now button
Monitor the sync progress in real-time

What Happens During First Sync:

For Full Refresh: All documents are extracted
For Incremental: All documents are extracted (initial sync)
Data is written to destination in batches

Monitor Sync Progress

Watch the sync job status:

Running: Sync in progress
Succeeded: Sync completed successfully
Failed: Check logs for errors
Cancelled: Manually stopped

Sync Statistics:

Records synced
Data volume
Sync duration
Records committed

Verify Data in Couchbase

After the sync completes:

Open Couchbase Capella console
Navigate to Data Tools → Documents
Select your destination bucket and scope
Browse the synced collections
Verify document structure and data

Part 4: Common Integration Patterns

Pattern 1: Couchbase to Couchbase (Cross-Bucket Replication)

Use Airbyte to replicate data between different Couchbase buckets within the same cluster or across different clusters. This pattern is useful for creating staging environments, maintaining backup copies, or distributing data across geographic regions. Configure the source connector to read from one bucket and the destination connector to write to another bucket using appropriate sync modes based on your requirements.

Pattern 2: External Sources to Couchbase (Data Ingestion)

Use Airbyte to consolidate data from multiple external sources into Couchbase. By centralizing data in Couchbase, you benefit from its flexible JSON document model, powerful N1QL query capabilities, and built-in full-text search. This pattern is ideal for building unified data platforms where Couchbase serves as your operational data store.

Pattern 3: Couchbase to External Destinations (Data Export)

Use Airbyte to share Couchbase data with downstream systems for specialized workloads like reporting or archival. While Couchbase can handle most operational and analytical needs directly, this pattern enables integration with legacy systems or specialized tools that require data feeds from your Couchbase cluster.

Performance and Best Practices

Source Performance Optimization

1. Incremental vs Full Refresh

Use Incremental When:

Collections are large
Changes are a small percentage of total data
Network bandwidth is limited

Use Full Refresh When:

Collections are small
Most documents change frequently
You need guaranteed consistency

3. Start Date Configuration

For the initial incremental sync, use start_date to limit the data window:

{
  "start_date": "2025-01-01T00:00:00Z"
}

Benefits:

Faster initial sync
Reduced resource usage
Can backfill historical data later if needed

Destination Performance Optimization

1. Collection Pre-creation

For large initial syncs, pre-create collections to avoid collection creation overhead during sync.

2. Sync Scheduling

Off-Peak Scheduling: Schedule large syncs during low-traffic periods.

# Daily full refresh at 2 AM
0 2 * * *

# Incremental sync every 30 minutes during business hours
*/30 9-17 * * MON-FRI

Parallel Syncs: Airbyte can sync multiple streams in parallel. Monitor cluster resources:

CPU utilization
Memory usage
Disk I/O
Network throughput

Network Optimization

1. Use TLS Connections

Always use couchbases:// for production:

Data encrypted in transit
Required for Capella
Minimal performance overhead

2. Regional Placement

For Airbyte Cloud:

Choose the Airbyte region closest to your Capella cluster
Reduces latency and data transfer costs

For Self-Hosted Airbyte:

Deploy in the same cloud region as Capella
Use private endpoints if available

3. Connection Pooling

The connector uses configurable timeout settings for key-value and query operations.

If you experience timeouts, consider:

Scaling up Capella cluster
Reducing sync frequency

Capella-Specific Optimizations

1. Bucket Configuration

Replicas: Consider replica count for destination buckets

Production: 2 replicas
Staging: 1 replica (acceptable for non-critical data)

Compression: Enable compression for large documents

Reduces storage costs
Minimal CPU overhead

3. Monitoring Metrics

Track these Capella metrics during syncs:

Operations per second (ops/sec)
Disk write queue
Memory usage percentage
Network bytes in/out

Set up alerts for high CPU, memory, and disk queue thresholds based on your cluster's normal operating levels.

Cost Optimization

1. Data Transfer

For Capella:

Incremental syncs reduce data transfer charges
Same-region syncs minimize egress costs
Use compression when possible

2. Storage

Append vs Append Dedup:

Append mode grows storage linearly
Append Dedup maintains constant size per unique record

3. Compute

Optimize Sync Frequency:

Not all data needs real-time replication
Batch less critical syncs to hourly or daily
Use incremental mode to reduce cluster load

Security Best Practices

1. Credential Management

Database Users:

Create separate users for source and destination
Use strong, unique passwords
Rotate credentials quarterly

Permissions:

Source User: Read access
Destination User: Read/Write access

2. Network Security

IP Allowlisting:

Restrict to specific Airbyte IPs only
Regularly audit allowed IPs
Remove unused entries

Connection String:

Always use couchbases:// (TLS encrypted)
Never use couchbase:// in production

3. Data Privacy

For Sensitive Data:

Use field-level encryption in Couchbase
Consider data masking in non-production syncs
Implement access logging and auditing

Compliance Considerations:

GDPR: Ensure right to deletion is maintained
HIPAA: Use appropriate network and encryption settings
SOC 2: Enable audit logging on both Airbyte and Couchbase

Data Quality Best Practices

1. Schema Consistency

Maintain consistent field types across documents:

// Good - consistent types
{"order_id": "12345", "total": 99.99}
{"order_id": "12346", "total": 149.99}

// Bad - inconsistent types
{"order_id": "12345", "total": 99.99}
{"order_id": 12346, "total": "149.99"}

2. Primary Key Selection

Good Primary Keys:

Immutable values (don't change over time)
Unique across all documents
Business-meaningful when possible

Examples:

User documents: user_id
Order documents: order_id
Events: Composite key [["user_id"], ["event_timestamp"]]

Avoid:

Auto-incrementing integers that may conflict
Timestamps (not guaranteed unique)
Fields that can be null

3. Null Handling

The destination connector converts nulls to empty strings.

4. Date/Time Standardization

Use ISO 8601 format for all timestamps:

{
  "created_at": "2025-01-15T14:30:00Z",
  "updated_at": "2025-01-20T09:15:30Z"
}

Benefits:

Sortable
Timezone-aware
Widely compatible

Monitoring and Troubleshooting

Monitoring Your Syncs

Airbyte Dashboard

Monitor sync health in the Airbyte UI:

Connection Status:

Navigate to your connection
Check "Last Sync Status" badge
Review sync history for patterns

Key Metrics to Track:

Success Rate: Percentage of successful syncs
Sync Duration: Track for performance degradation
Records Synced: Verify expected data volume

Couchbase Capella Monitoring

During Active Syncs, Monitor:

Cluster Metrics (Settings → Metrics):
- CPU utilization
- Memory usage
- Disk I/O
- Network throughput
Bucket Statistics:
- Operations per second
- Item count
- Data size
- Disk usage
Query Performance (Query → Workbench): Monitor query execution times during sync periods

Common Issues and Solutions

Connection Issues

"Connection check failed": Verify IP is allowlisted in Capella, connection string format is correct (couchbases://cb.xxxxx.cloud.couchbase.com), and credentials are valid.

"Network timeout": Check firewall rules and ensure Airbyte can reach Capella on port 11207.

Source Connector Issues

"No streams discovered": Ensure bucket has collections with documents and user has read permissions.

"Incremental sync not detecting changes": Reset connection state in Airbyte (Connection Settings → Advanced → Reset Data).

Destination Connector Issues

"Collection creation failed": Verify user has Read/Write access and scope exists.

"Batch write timeout": Scale up Capella cluster or reduce sync frequency.

Sync Performance Issues

"Syncs are very slow": Switch to Incremental mode, disable unused streams, and ensure Airbyte and Capella are in the same region.

Debugging Tips

1. Enable Detailed Logging

In Airbyte:

Navigate to failing sync job
Click "View Logs"
Look for ERROR and WARN level messages
Check for stack traces

Common Log Patterns:

ERROR - Failed to connect: timeout
→ Network/connectivity issue

ERROR - Schema validation failed
→ Data type mismatch

WARN - Retrying batch write (attempt 2/3)
→ Temporary issue, may resolve itself

2. Test Connectivity

Use Couchbase SDK or cbshell to test connection independently:

# Using cbshell
cbshell -c couchbases://cb.xxxxx.cloud.couchbase.com \
        -u username -p password

If this fails, the issue is with Couchbase access, not Airbyte.

3. Verify Data Manually

After a sync, spot-check data in Couchbase via the Capella console Documents browser.

4. Reset Connection State

If incremental sync is stuck or behaving incorrectly:

Navigate to Connection in Airbyte
Go to Settings → Advanced
Click "Reset Data"
Confirm the reset
Next sync will be a full refresh, then resume incrementally

Warning: This clears sync state and forces a complete re-sync.

5. Check Couchbase System Tables

Query system tables for diagnostic info via the Capella Query Workbench to check active queries during sync operations.

Getting Help

Airbyte Community:

Airbyte Slack Community
GitHub Issues
Tag: connector: source-couchbase or connector: destination-couchbase

Couchbase Community:

Couchbase Forums
Discord Server
Category: Data Integration / ETL

Contents