This notebook demonstrates how to build an intelligent agent using OpenAI agents framework that can interact with a Couchbase database. The key to this interaction is the Model Context Protocol (MCP), which allows the AI agent to seamlessly connect to and use Couchbase as a tool.
The Model Context Protocol (MCP) is an open standard designed to standardize how AI assistants and applications connect to and interact with external data sources, tools, and systems. Think of MCP as a universal adapter that allows AI models to seamlessly access the context they need to produce more relevant, accurate, and actionable responses.
Key Goals and Features of MCP:
MCP aims to break down data silos, making it easier for AI to integrate with real-world applications and enterprise systems, leading to more powerful and context-aware AI solutions.
MCP Typically Follows a Client-Server Architecture:
couchbase-mcp-server project fulfills this role for Couchbase.Please follow the instructions to generate the OpenAI credentials.
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.
To learn more, please follow the instructions.
When running Couchbase using Capella, the following prerequisites need to be met.
Before running this notebook, ensure you have the following prerequisites met:
Set Environment Variables: This notebook loads the OpenAI API key and other environment variables from the .env file. Include the following:
OPENAI_API_KEY=your_openai_api_key_here
CB_CONNECTION_STRING=your_couchbase_connection_string
CB_USERNAME=your_couchbase_username
CB_PASSWORD=your_couchbase_password
CB_BUCKET_NAME=your_target_bucket # e.g., travel-sampleWe have already included a .env.sample file. Change the file name to .env and fill in the environment variables.
Setup uv: uv is a modern and fast python package and project manager. We will use uv to run the MCP server. Install uv from here.
Python Libraries: Install the necessary libraries by running the code cell below.
%pip install -q 'openai-agents==0.9.3' 'python-dotenv==1.2.1'Note: you may need to restart the kernel to use updated packages.This cell imports the essential Python tools for our project:
dotenv: For loading secret API keys and other settings from a .env file.agents (Agent, Runner, gen_trace_id, trace): Custom modules related to the agent framework being used, providing core agent functionalities, a way to run agents, generate trace IDs for debugging/monitoring, and tracing capabilities.agents.mcp (MCPServer, MCPServerStdio): Custom modules for interacting with the Model Context Protocol (MCP), specifically for setting up an MCP server, possibly one that communicates over standard input/output (stdio).Running this cell makes all these components ready to use.
from dotenv import load_dotenv
from agents import Agent, Runner, gen_trace_id, trace
from agents.mcp import MCPServer, MCPServerStdio
import os
load_dotenv()TrueThis cell defines an asynchronous function qna(agent) that we'll use to interact with our agent.
agent as an argument.message strings) that are posed to the agent.await Runner.run(starting_agent=agent, input=message) to get the agent's response. The Runner.run method is responsible for executing the agent with the given input.async def qna(agent):
message = "Tell me about the database that you are connected to."
print(f"\n\n**Running**: {message}\n")
result = await Runner.run(starting_agent=agent, input=message)
print(result.final_output)
print('-'*50)
message = "List out the top 5 hotels by the highest aggregate rating?"
print(f"\n\n**Running**: {message}\n")
result = await Runner.run(starting_agent=agent, input=message)
print(result.final_output)
print('-'*50)
message = "Recommend me a flight and hotel from New York to San Francisco"
print(f"\n\n**Running**: {message}\n")
result = await Runner.run(starting_agent=agent, input=message)
print(result.final_output)
print('-'*50)
message = "I'm going to the UK for 1 week. Recommend some great spots to visit for sightseeing. Also mention the respective prices of those places for adults and kids."
print(f"\n\n**Running**: {message}\n")
result = await Runner.run(starting_agent=agent, input=message)
print(result.final_output)
print('-'*50)
message = "My budget is around 30 pounds a night. What will be the best hotel to stay in?"
print(f"\n\n**Running**: {message}\n")
result = await Runner.run(starting_agent=agent, input=message)
print(result.final_output)
print('-'*50)The system prompt is a crucial piece of instruction given to the Large Language Model (LLM) that powers our agent. It sets the context, defines the agent's persona, capabilities, and constraints.
In this system prompt:
inventory scope, so use only that scope." This focuses the agent on the relevant part of the travel-sample database.inventory scope.A well-crafted system prompt significantly improves the agent's performance and reliability.
system_prompt = """Couchbase organizes data with the following hierarchy (from top to bottom):
1. Cluster: The overall container of all Couchbase data and services.
2. Bucket: A bucket is similar to a database in traditional systems. Each bucket contains multiple scopes. Example: "users", "analytics", "products"
3. Scope: A scope is a namespace within a bucket that groups collections. Scopes help isolate data for different microservices or tenants. Default scope name: _default
4. Collection: The equivalent of a table in relational databases. Collections store JSON documents. Default collection name: _default
5. Document: The atomic data unit (usually JSON) stored in a collection. Each document has a unique key within its collection.
IMPORTANT RULES:
- ALWAYS use the tools to query the database to answer questions. NEVER answer from your own knowledge.
- If the database does not contain relevant data, say so explicitly rather than making up an answer.
IMPORTANT SQL++ Query Rules:
- Use the tools to read the database and answer questions based on this database
- The data is inside `inventory` scope, so use only that scope
- Use only the collection name in the FROM clause (e.g., FROM `hotel`)
- Collection names and top-level field names should be in backticks
- For nested fields, use dot notation WITHOUT backticks around each part
CORRECT: `hotel`.reviews[0].ratings.Overall
WRONG: `hotel`.`reviews`.`ratings`.`Overall`
- When accessing nested objects or arrays, use bracket notation or dot notation directly
Examples:
- hotel.reviews[0].author
- hotel.geo.lat
Available Collections in `inventory` scope:
- `airline`: Airline information (name, callsign, iata, icao, country, etc.)
- `airport`: Airport data (airportname, faa, city, country, geo, tz, etc.)
- `hotel`: Hotel/accommodation data (see structure below)
- `landmark`: Sightseeing spots, tourist attractions, restaurants, activities (see structure below)
- `route`: Flight route information (airline, sourceairport, destinationairport, distance, schedule, etc.)
Hotel Document Structure:
- Top-level fields: name, city, country, state, address, description, price, type, id, vacancy, pets_ok etc.
- address: A string field containing the street address (e.g., "321 Castro St")
- city, country, state: Top-level string fields (e.g., city = "San Francisco", country = "United States")
- geo: Object with fields {lat, lon, accuracy}
- reviews: Array of review objects with ratings and content
- To filter by city or country: WHERE `city` = "San Francisco" OR WHERE `country` = "United States"
- Do NOT use "addresses" (plural) - the field is "address" (singular)
Landmark Document Structure:
- Top-level fields: name, city, country, state, address, content, activity, price, phone, url, email, hours, directions, toll_free, image, geo, etc.
- activity: Type of landmark (e.g., "see", "do", "eat", "drink", "buy", "listing")
- content: Description of the landmark
- price: Price or admission information (may be a string or null)
- For sightseeing queries, use the `landmark` collection and filter by activity = "see" or "do"
- Example: SELECT `name`, `city`, `price`, `content`, `activity` FROM `landmark` WHERE `country` = "United Kingdom" AND `activity` = "see"
Airline Document Structure:
- Top-level fields: name, callsign, iata, icao, country, type, id
Route Document Structure:
- Top-level fields: airline, airlineid, sourceairport, destinationairport, stops, equipment, schedule, distance, type, id
- schedule: Array of objects with {day, utc, flight}
ARRAY Operations in SQL++:
- To aggregate data from arrays (like reviews), use UNNEST to flatten the array first
- CORRECT way to sum array values:
SELECT h.name, SUM(r.ratings.Overall) as total_rating
FROM `hotel` h
UNNEST h.reviews r
GROUP BY h.name
ORDER BY total_rating DESC
- WRONG ways (these will cause parser errors):
x SELECT name, SUM(ARRAY_SUM(ARRAY reviews[*].ratings.Overall FOR reviews IN...))
x SELECT name, ARRAY reviews[*].ratings.Overall FOR reviews...
x WHERE ANY a IN addresses SATISFIES... (wrong field name)
- Use UNNEST whenever you need to work with individual array elements in aggregations"""This cell sets up the agent.
run that initializes an AI agent.system_prompt), uses the "gpt-5.2" model, and is connected to the Couchbase MCP server (mcp_server).qna, to start the question-and-answer process with the database.async def run(mcp_server: MCPServer):
agent = Agent(
name="Assistant",
instructions=system_prompt,
model="gpt-5.2",
mcp_servers=[mcp_server],
)
await qna(agent=agent)This cell:
couchbase-mcp-server using MCPServerStdio.couchbase-mcp-server package from PyPI via uvx couchbase-mcp-server\n",.env file in the parent directory via python-dotenv, which was loaded in an earlier cell.server object is passed to the run function, allowing the AI agent to interact with Couchbase via MCP.The agent will then attempt to answer the series of questions defined in the qna function by:
async with MCPServerStdio(
params={
"command": "uvx",
"args": ["couchbase-mcp-server"],
"env": {
"CB_CONNECTION_STRING": os.getenv("CB_CONNECTION_STRING"),
"CB_USERNAME": os.getenv("CB_USERNAME"),
"CB_PASSWORD": os.getenv("CB_PASSWORD"),
"CB_BUCKET_NAME": os.getenv("CB_BUCKET_NAME")
}
}
) as server:
trace_id = gen_trace_id()
with trace(trace_id=trace_id, workflow_name="CouchbaseDemo"):
print(f"View trace: https://platform.openai.com/traces/trace?trace_id={trace_id}\n")
print("-"*25, "Starting Run", "-"*25)
await run(server)View trace: https://platform.openai.com/traces/trace?trace_id=trace_4ab60c8e3d7b49f58e797c795a44946f
------------------------- Starting Run -------------------------
**Running**: Tell me about the database that you are connected to.
You’re connected to a local **Couchbase Server cluster** (connection string: `couchbase://localhost`) using the **Administrator** user. The environment is configured in **read-only mode** (no writes).
## What’s in this cluster
### Buckets
- **`travel-sample`** (the only bucket present)
### Scopes & collections (in `travel-sample`)
- **`inventory`** (this is the scope I will use for data queries)
- `airline`
- `airport`
- `hotel`
- `landmark`
- `route`
- Other scopes also exist (e.g., `tenant_agent_00` … `tenant_agent_04` with `users`/`bookings`), plus `_default` and `_system`.
## Inventory data size (document counts)
Within scope **`inventory`**:
- `airline`: **187**
- `airport`: **1,968**
- `hotel`: **917**
- `landmark`: **4,495**
- `route`: **24,024**
## Indexes (inventory scope)
There are ready indexes including:
- Primary indexes on: `airline`, `airport`, `hotel`, `landmark`, `route`
- Secondary indexes such as:
- `airport(airportname)`, `airport(city)`, `airport(faa)`
- `hotel(city)`
- `landmark(city)`
- route-focused indexes (e.g., on `sourceairport`, and array/schedule-related fields)
## Cluster services & health
Services reachable and reporting **ok**:
- KV, Query, Analytics, Search, Views, Management
(latencies were a few milliseconds as reported by ping)
If you tell me what kind of data you want to explore (hotels, airports, routes, etc.), I can summarize key fields or run specific queries in the `inventory` scope.
--------------------------------------------------
**Running**: List out the top 5 hotels by the highest aggregate rating?
Top 5 hotels by highest **aggregate (sum) Overall rating** across all reviews:
1. **Hotel Eldorado** — aggregate_rating: **109** (review_count: 29)
2. **Hafod Lon Holiday Apartment** — aggregate_rating: **70** (review_count: 17)
3. **Uist Outdoor Centre** — aggregate_rating: **53** (review_count: 12)
4. **Lochmaddy Hotel** — aggregate_rating: **50** (review_count: 11)
5. **Radisson Blu** — aggregate_rating: **50** (review_count: 13)
--------------------------------------------------
**Running**: Recommend me a flight and hotel from New York to San Francisco
### Flight (nonstop)
**United Airlines (UA)** — **EWR → SFO** (Newark Liberty Intl → San Francisco Intl)
- Stops: **0 (nonstop)**
- Distance: **~4118 km**
- Aircraft/equipment listed: **739 / 319 / 738 / 752 / 73G** (varies by flight)
(Alternative nonstop also shown in the data: **Virgin America (VX)** on **EWR → SFO**.)
### Hotel (San Francisco, available + strong reviews)
**Wharf Inn** — **2601 Mason St, San Francisco, CA**
- Avg review rating: **5.0** (from **8** reviews)
- Price: **$160–$195**
- Notes: “Small, friendly, in the heart of Fisherman’s Wharf… free parking.”
If you tell me your travel dates and your preference (cheapest vs best-rated vs specific neighborhood), I can narrow it down further.
--------------------------------------------------
**Running**: I'm going to the UK for 1 week. Recommend some great spots to visit for sightseeing. Also mention the respective prices of those places for adults and kids.
Here are some **UK sightseeing (“see/do”) spots** from the database, with the **prices shown for adults and kids when explicitly provided**:
| Place | Where (city) | What it is | Adult price | Kids price |
|---|---|---|---:|---:|
| **Saltram House** | (city not listed) | Historic Georgian house & estate | **£10.00** | **£5.00** |
| **Thinktank** | (city not listed) | Science museum (hands-on exhibits; IMAX mentioned) | **£6.95** | **£4.95** |
| **Richborough Roman Fort and Amphitheatre** | Ash | Roman fort & amphitheatre (English Heritage) | **£5.00** | **£3.00** |
| **Castell-y-Bere** | (city not listed) | Ruined 13th-century Welsh castle in dramatic scenery | **Free entry** | **Free entry** |
| **Menai Suspension Bridge** | Bangor | Iconic bridge (Thomas Telford, completed 1826) | **Free** | **Free** |
Notes (based on what’s stored in the dataset):
- Many UK landmarks have `price = null` or a single non-split value (e.g., “£11”, “From £39 per person”), so I **can’t reliably give adult vs. child prices** for those from this data.
- If you tell me **which cities you’ll base yourself in** (e.g., London + Edinburgh), I can query a tighter list nearby and prioritize spots with clearly stated pricing.
--------------------------------------------------
**Running**: My budget is around 30 pounds a night. What will be the best hotel to stay in?
Based on the hotels in our database with a nightly `price` **≤ 30** and **at least 3 reviews**, the best-rated option is:
- **Stirling Youth Hostel** (Stirling, United Kingdom) — **£26/night**
- **Average review rating (Overall): 4.71/5**
- **Reviews counted: 7**
If you want, tell me what **city/country** you’re traveling to and whether **hostels are OK**, and I can narrow it down to the best match for your trip.
--------------------------------------------------