This notebook demonstrates how to build a ReAct (Reasoning and Acting) agent using LangChain and LangGraph that can interact with a Couchbase database. The key to this interaction is the Model Context Protocol (MCP), which allows the AI agent to seamlessly connect to and use Couchbase as a tool. Read more about LangChain's ReAct agent here.
The Model Context Protocol (MCP) is an open standard designed to standardize how AI assistants and applications connect to and interact with external data sources, tools, and systems. Think of MCP as a universal adapter that allows AI models to seamlessly access the context they need to produce more relevant, accurate, and actionable responses.
Key Goals and Features of MCP:
MCP aims to break down data silos, making it easier for AI to integrate with real-world applications and enterprise systems, leading to more powerful and context-aware AI solutions.
MCP Typically Follows a Client-Server Architecture:
couchbase-mcp-server project fulfills this role for Couchbase.Please follow the instructions to generate the OpenAI credentials.
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.
To learn more, please follow the instructions.
When running Couchbase using Capella, the following prerequisites need to be met.
Before running this notebook, ensure you have the following prerequisites met:
Set Environment Variables: This notebook loads the OpenAI API key and other environment variables from the .env file. Include the following:
OPENAI_API_KEY=your_openai_api_key_here
CB_CONNECTION_STRING=your_couchbase_connection_string
CB_USERNAME=your_couchbase_username
CB_PASSWORD=your_couchbase_password
CB_BUCKET_NAME=your_target_bucket # e.g., travel-sampleWe have already included a .env.sample file. Change the file name to .env and fill in the environment variables.
Setup uv: uv is a modern and fast python package and project manager. We will use uv to run the MCP server. Install uv from here.
Python Libraries: Install the necessary libraries by running the code cell below.
%pip install -q 'langchain==1.2.10' 'langgraph==1.0.9' 'langchain-openai==1.1.10' 'langchain-mcp-adapters==0.2.1' 'python-dotenv==1.2.1'Note: you may need to restart the kernel to use updated packages.This cell imports the essential Python tools for our project:
dotenv & os: For loading and using secret API keys and other settings from a .env file.mcp (ClientSession, StdioServerParameters, stdio_client): For connecting this notebook (as a client) to the MCP server, which in turn talks to Couchbase.langchain_mcp_adapters.tools (load_mcp_tools): To make the Couchbase tools (exposed via MCP) usable by our LangChain AI agent.langchain.agents (create_agent): To easily build a "ReAct" AI agent that can think and use tools.langgraph.checkpoint.memory (InMemorySaver): To help the agent remember past parts of the conversation.langchain_openai (ChatOpenAI): To connect to and use OpenAI's language models (like GPT-5).Running this cell makes all these components ready to use.
from dotenv import load_dotenv
import os
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain.agents import create_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI
load_dotenv()TrueThis cell defines an asynchronous function qna(agent) that we'll use to interact with our ReAct agent.
agent as an argument.config = {"configurable": {"thread_id": "1"}}: This configuration is important for LangGraph agents. It uses a thread_id to maintain conversation state. Using the same thread_id across multiple calls to the agent allows it to remember previous interactions in that "thread."message) which we want to ask the agent. The agent queries the Couchbase MCP to get travel related data, formats it and presents it to the user.async def qna(agent):
config = {"configurable": {"thread_id": "1"}}
message = "Tell me about the database that you are connected to."
print(f"\n\n**Running:** {message}\n")
result = await agent.ainvoke({"messages": message}, config)
print(result["messages"][-1].content)
print('-'*50)
message = "List out the top 5 hotels by the highest aggregate rating?"
print(f"\n\n**Running**: {message}\n")
result = await agent.ainvoke({"messages": message}, config)
print(result["messages"][-1].content)
print('-'*50)
message = "Recommend me a flight and hotel from New York to San Francisco"
print(f"\n\n**Running**: {message}\n")
result = await agent.ainvoke({"messages": message}, config)
print(result["messages"][-1].content)
print('-'*50)
message = "I'm going to the UK for 1 week. Recommend some great spots to visit for sightseeing. Also mention the respective prices of those places for adults and kids."
print(f"\n\n**Running**: {message}\n")
result = await agent.ainvoke({"messages": message}, config)
print(result["messages"][-1].content)
print('-'*50)
message = "My budget is around 30 pounds a night. What will be the best hotel to stay in?"
print(f"\n\n**Running**: {message}\n")
result = await agent.ainvoke({"messages": message}, config)
print(result["messages"][-1].content)
print('-'*50)The system prompt is a crucial piece of instruction given to the Large Language Model (LLM) that powers our agent. It sets the context, defines the agent's persona, capabilities, and constraints.
In this system prompt:
inventory scope, so use only that scope." This focuses the agent on the relevant part of the travel-sample database.inventory scope.A well-crafted system prompt significantly improves the agent's performance and reliability.
system_prompt = """Couchbase organizes data with the following hierarchy (from top to bottom):
1. Cluster: The overall container of all Couchbase data and services.
2. Bucket: A bucket is similar to a database in traditional systems. Each bucket contains multiple scopes. Example: "users", "analytics", "products"
3. Scope: A scope is a namespace within a bucket that groups collections. Scopes help isolate data for different microservices or tenants. Default scope name: _default
4. Collection: The equivalent of a table in relational databases. Collections store JSON documents. Default collection name: _default
5. Document: The atomic data unit (usually JSON) stored in a collection. Each document has a unique key within its collection.
IMPORTANT SQL++ Query Rules:
- Use the tools to read the database and answer questions based on this database
- The data is inside `inventory` scope, so use only that scope
- Use only the collection name in the FROM clause (e.g., FROM `hotel`)
- Collection names and top-level field names should be in backticks
- For nested fields, use dot notation WITHOUT backticks around each part
CORRECT: `hotel`.reviews[0].ratings.Overall
WRONG: `hotel`.`reviews`.`ratings`.`Overall`
- When accessing nested objects or arrays, use bracket notation or dot notation directly
Examples:
- hotel.reviews[0].author
- hotel.geo.lat
Hotel Document Structure:
- Top-level fields: name, city, country, state, address, description, price, type, id, vacancy, pets_ok etc.
- address: A string field containing the street address (e.g., "321 Castro St")
- city, country, state: Top-level string fields (e.g., city = "San Francisco", country = "United States")
- geo: Object with fields {lat, lon, accuracy}
- reviews: Array of review objects with ratings and content
- To filter by city or country: WHERE `city` = "San Francisco" OR WHERE `country` = "United States"
- Do NOT use "addresses" (plural) - the field is "address" (singular)
ARRAY Operations in SQL++:
- To aggregate data from arrays (like reviews), use UNNEST to flatten the array first
- CORRECT way to sum array values:
SELECT h.name, SUM(r.ratings.Overall) as total_rating
FROM `hotel` h
UNNEST h.reviews r
GROUP BY h.name
ORDER BY total_rating DESC
- WRONG ways (these will cause parser errors):
x SELECT name, SUM(ARRAY_SUM(ARRAY reviews[*].ratings.Overall FOR reviews IN...))
x SELECT name, ARRAY reviews[*].ratings.Overall FOR reviews...
x WHERE ANY a IN addresses SATISFIES... (wrong field name)
- Use UNNEST whenever you need to work with individual array elements in aggregations"""This cell sets up two key components:
model = ChatOpenAI(model="gpt-5.2"): This line initializes the LLM we'll be using. We're choosing gpt-5.2 from OpenAI.server_params = StdioServerParameters(...):
couchbase-mcp-server application.command="uvx" and args=["couchbase-mcp-server"]: This uses uvx (from the uv tool) to automatically fetch and run the published couchbase-mcp-server package. No need to clone the repository — the package is installed and executed on the fly.env={...}: This dictionary defines environment variables that will be passed to the MCP server process when it starts. These are crucial for the MCP server to connect to your Couchbase instance:model = ChatOpenAI(model="gpt-5.2")
server_params = StdioServerParameters(
command="uvx",
args=["couchbase-mcp-server"],
env={
"CB_CONNECTION_STRING": os.getenv("CB_CONNECTION_STRING"),
"CB_USERNAME": os.getenv("CB_USERNAME"),
"CB_PASSWORD": os.getenv("CB_PASSWORD"),
"CB_BUCKET_NAME": os.getenv("CB_BUCKET_NAME")
}
)The main function ties everything together to set up and run our agent:
couchbase-mcp-server process using stdio_client and establishes a communication ClientSession with it.session is initialized. Then, load_mcp_tools queries the MCP server to get the available Couchbase tools and prepares them for LangChain.InMemorySaver is created to allow the agent to remember conversation history.create_agent function builds our AI agent, providing it with the language model, the Couchbase tools, our system_prompt, and the checkpoint for memory.qna function, passing the created agent to start the question-and-answer process with the database.async def main():
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the connection
print("Initializing connection...")
await session.initialize()
# Get tools
print("Loading tools...")
tools = await load_mcp_tools(session)
# Create and run the agent
print("Creating agent...")
checkpoint = InMemorySaver()
agent = create_agent(
model,
tools,
system_prompt=system_prompt,
checkpointer=checkpoint
)
print("-"*25, "Starting Run", "-"*25)
await qna(agent)This final cell simply executes the await main() function.
When you run this cell:
couchbase-mcp-server process will be started in the background.qna function by:
You will see the questions and the agent's answers printed below. This demonstrates the end-to-end flow of a natural language query being translated into database actions and then into a user-friendly response, all orchestrated by the LangChain agent using MCP.
await main()Initializing connection...
Loading tools...
Creating agent...
------------------------- Starting Run -------------------------
**Running:** Tell me about the database that you are connected to.
You’re connected to a **Couchbase** cluster running at:
- **Connection string:** `couchbase://localhost`
- **Auth user:** `Administrator`
- **Server status:** running
- **Access mode:** **read-only** (documents and queries are restricted to read-only)
## Buckets available
- `travel-sample` (the sample travel dataset)
## Scopes & collections in `travel-sample`
- **`inventory`** (this is the main dataset for travel domain data)
- `airline`
- `airport`
- `hotel`
- `landmark`
- `route`
- **Tenant sample scopes** (multi-tenant style)
- `tenant_agent_00`: `users`, `bookings`
- `tenant_agent_01`: `users`, `bookings`
- `tenant_agent_02`: `users`, `bookings`
- `tenant_agent_03`: `users`, `bookings`
- `tenant_agent_04`: `users`, `bookings`
- `_default`: `_default`
- `_system`: `_query`, `_mobile`
## What’s inside the `inventory` scope (high level)
These collections store JSON documents with a top-level `type` field matching the entity:
- **`hotel`** (~917 docs): hotels with fields like `name`, `address`, `city`, `country`, `price`, `pets_ok`, `vacancy`, `geo`, and an array of `reviews` (each with `ratings.Overall`, etc.).
- **`airport`** (~1000 docs): airports with `airportname`, `city`, `country`, `geo` (lat/lon/alt), timezone `tz`, etc.
- **`landmark`** (~1000 docs): points of interest with `name`, `activity`, `city`, `country`, `geo`, and optional `address`, `hours`, `price`, etc.
- **`route`** (~1000 docs): flight routes with `sourceairport`, `destinationairport`, `airline`, `distance`, and an array `schedule`.
- **`airline`** (~187 docs): airlines with `name`, `country`, `icao`, and optional `iata`/`callsign`.
If you tell me what you’re trying to do (explore counts, example documents, common queries, indexes, etc.), I can pull specific details from the `inventory` collections.
--------------------------------------------------
**Running**: List out the top 5 hotels by the highest aggregate rating?
Top 5 hotels by **highest aggregate (sum) of `reviews[].ratings.Overall`**:
| Rank | Hotel | Total overall rating (sum) | Review count |
|---:|---|---:|---:|
| 1 | Hotel Eldorado | 109 | 29 |
| 2 | Hafod Lon Holiday Apartment | 70 | 17 |
| 3 | Uist Outdoor Centre | 53 | 12 |
| 4 | Lochmaddy Hotel | 50 | 11 |
| 5 | Radisson Blu | 50 | 13 |
--------------------------------------------------
**Running**: Recommend me a flight and hotel from New York to San Francisco
Here’s one solid **flight + hotel** recommendation (using what’s in the `inventory` data).
## Flight (New York → San Francisco)
- **Route:** **EWR → SFO** (Newark Liberty → San Francisco Intl)
- **Stops:** **0 (nonstop)**
- **Airlines available on this nonstop route:** **United (UA)** or **Virgin America (VX)**
- **Approx. distance:** ~**4118 km**
- **Airports:**
- New York area: **EWR/JFK/LGA** (I found nonstop options from **EWR** and **JFK** to **SFO**)
- San Francisco: **SFO (San Francisco Intl)**
If you prefer **JFK → SFO**, nonstop airlines in the data include **AA, DL, UA, VX, AS, B6, US**.
## Hotel (San Francisco)
Top-rated (by average review overall rating) among hotels currently marked with **`vacancy` = true**:
- **Wharf Inn** — **2601 Mason St**, San Francisco, CA
- **Avg overall rating:** 5.0 (from 8 reviews)
- **Vacancy:** true
If you tell me your preferred NYC departure airport (JFK/LGA/EWR), travel dates, and whether you care about free breakfast/parking/pets, I can narrow it down further.
--------------------------------------------------
**Running**: I'm going to the UK for 1 week. Recommend some great spots to visit for sightseeing. Also mention the respective prices of those places for adults and kids.
Some great **UK sightseeing spots** from the database (all are landmarks in the **United Kingdom**). I’m including the **adult/kid prices where the stored `price` text provides them**; in a few cases the data only lists a single price or uses concessions/family tickets instead of “kids”.
### 1) Glasgow School of Art (Glasgow)
- **Why go:** iconic architecture/design (Charles Rennie Mackintosh)
- **Price (from data):** **£9.75 adults**, **£4.75 youth** (also £8 students/seniors)
- **Address:** 167 Renfrew St
### 2) Glasgow Science Centre (Glasgow)
- **Why go:** very strong hands-on museum for a 1-week trip, great with kids
- **Price (from data):** **£10.50 adults**, **£8.50 children/seniors**
- Extras listed: add **£2.50** for planetarium/IMAX (each), **£4.95** for the tower
- **Address:** 50 Pacific Quay
### 3) Housesteads (Hadrian’s Wall) (Northumberland)
- **Why go:** one of the best Roman sites on Hadrian’s Wall
- **Price (from data):** **£6.20 adults**, **£3.70 children** (concessions £5.60)
- **Address:** Haydon Bridge, NE47 6NN
### 4) Vindolanda (Northumberland)
- **Why go:** major Roman fort/archaeology site near Hadrian’s Wall
- **Price (from data):** **£10 adults**, **£5.50 children** (also mentions a joint ticket option)
- **Address:** (not provided in the data row)
### 5) Kelvingrove Art Gallery and Museum (Glasgow)
- **Why go:** top museum with huge variety (art, natural history, etc.)
- **Price (from data):** **Free** (so adults/kids: **£0 / £0**)
- **Address:** Argyle Street, West End
### 6) Pevensey Castle (Pevensey)
- **Why go:** atmospheric castle/fortifications; easy classic “England castle” stop
- **Price (from data):** **£5.00 adults**, **£3.00 children** (concessions £4.50)
- **Address:** Castle Road, Pevensey, BN24 5LE
### 7) Rievaulx Abbey (North Yorkshire)
- **Why go:** dramatic abbey ruins in a beautiful setting
- **Price (from data):** **£6.00 adults**, **£3.60 children** (concessions £5.40)
- **Address:** Rievaulx, YO62 5LB
If you tell me which cities you’ll base yourself in (e.g., **London + York**, or **Edinburgh + Highlands**, etc.), I can turn this into a tight **7-day route** with nearby landmarks and more entries that explicitly list adult/child pricing.
--------------------------------------------------
**Running**: My budget is around 30 pounds a night. What will be the best hotel to stay in?
With a ~**£30/night** budget, the best match I can see in the UK hotel data **that’s currently showing `vacancy = true`** is:
## Rooms Inn (Newcastle upon Tyne)
- **Price:** *“Double £30–85, breakfast extra”* (so £30/night is achievable depending on date/room)
- **Vacancy:** true
- **Address:** 40 West Parade, NE4 7LB, Newcastle upon Tyne
A couple of other ~£30 options exist in the data, but they’re either **not available** (`vacancy = false`) or are **priced per person** rather than per room/night:
- **The Wellington (London)** — “From £30” but `vacancy = false`
- **Holmefield Gardens B&B (Barrowford)** — “From £30 per person breakfast included” (`vacancy = true`)
If you tell me **which city in the UK** you’re staying in (London vs elsewhere) and whether **£30 is per room or per person**, I can refine to the best pick for your exact plan.
--------------------------------------------------