In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, gpt-4o model as the large language model provided by OpenAI. We will use the text-embedding-3-large model for generating embeddings.
This notebook demonstrates how to build a RAG system using:
We leverage Couchbase's Search Vector Index to create and manage vector indexes, enabling efficient semantic search capabilities. The Search Vector Index provides the infrastructure for storing, indexing, and querying high-dimensional vector embeddings alongside traditional text search functionality.
Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and LlamaIndex. Alternatively if you want to perform semantic search using the Hyperscale or Composite Vector Index, please take a look at this.
This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.
You can either download the notebook file and run it on Google Colab or run it on your system by setting up the Python environment.
To get started with Couchbase Capella, create an account and use it to deploy an operational cluster.
To know more, please follow the instructions.
When running Couchbase using Capella, the following prerequisites need to be met:
In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context.
For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation:
Embedding Model: We'll use OpenAI's text-embedding-3-large model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities.
Large Language Model: We'll use OpenAI's gpt-4o model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively.
Prerequisites for OpenAI Integration:
For more details about OpenAI's models and pricing, please refer to the OpenAI documentation.
To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models.
# Install required packages
%pip install --no-user --quiet datasets==3.6.0 llama-index==0.14.13 llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai==0.6.18 python-dotenv==1.2.1[notice] A new release of pip is available: 25.0.1 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.
import getpass
import hashlib
import json
import logging
import os
import sys
import time
from datetime import timedelta
from dotenv import load_dotenv
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import CouchbaseException
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from llama_index.core import Settings, Document, VectorStoreIndex
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import MetadataMode
from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI/Users/kaustavghosh/Desktop/vector-search-cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdmIn this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.
The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.
OPENAI_API_KEY is your OpenAI API key which can be obtained from your OpenAI dashboard at platform.openai.com.
INDEX_NAME is the name of the Search Vector Index we will use for vector search operations.
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ')
CB_HOST = os.getenv('CB_HOST', 'couchbase://localhost') or input('Enter Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME', 'Administrator') or input('Enter Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD', 'password') or getpass.getpass('Enter Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME', 'vector-search-testing') or input('Enter Couchbase bucket name: ')
SCOPE_NAME = os.getenv('SCOPE_NAME', 'shared') or input('Enter scope name: ')
COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'llamaindex') or input('Enter collection name: ')
INDEX_NAME = os.getenv('INDEX_NAME', 'vector_search_llamaindex') or input('Enter index name: ')
if not all([OPENAI_API_KEY, CB_HOST, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME]):
raise ValueError("All configuration variables must be provided.")
if 'OPENAI_API_KEY' not in os.environ:
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEYLogging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels.
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
logging.getLogger("httpx").setLevel(logging.WARNING)The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches.
try:
# Initialize the Couchbase Cluster
auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
options = ClusterOptions(auth)
# Connect to the cluster
cluster = Cluster(CB_HOST, options)
# Wait for the cluster to be ready
cluster.wait_until_ready(timedelta(seconds=5))
logging.info("Successfully connected to the Couchbase cluster")
except CouchbaseException as e:
raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}")2026-02-12 10:13:30,257 - INFO - Successfully connected to the Couchbase clusterBefore we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents.
# Create bucket if it does not exist
bucket_manager = cluster.buckets()
try:
bucket_manager.get_bucket(CB_BUCKET_NAME)
print(f"Bucket '{CB_BUCKET_NAME}' already exists.")
except Exception as e:
print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...")
bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500)
bucket_manager.create_bucket(bucket_settings)
print(f"Bucket '{CB_BUCKET_NAME}' created successfully.")
# Create scope and collection if they do not exist
collection_manager = cluster.bucket(CB_BUCKET_NAME).collections()
scopes = collection_manager.get_all_scopes()
scope_exists = any(scope.name == SCOPE_NAME for scope in scopes)
if scope_exists:
print(f"Scope '{SCOPE_NAME}' already exists.")
else:
print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...")
collection_manager.create_scope(SCOPE_NAME)
print(f"Scope '{SCOPE_NAME}' created successfully.")
collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections]
collection_exists = COLLECTION_NAME in collections
if collection_exists:
print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.")
else:
print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...")
collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME)
print(f"Collection '{COLLECTION_NAME}' created successfully.")Bucket 'vector-search-testing' already exists.
Scope 'shared' already exists.
Collection 'llamaindex' already exists in scope 'shared'.With the index definition loaded, the next step is to create or update the Vector Search Index in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our RAG to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust RAG system.
# Create search index from llamaindex_index.json file at scope level
with open('llamaindex_index.json', 'r') as search_file:
search_index_definition = SearchIndex.from_json(json.load(search_file))
# Update search index definition with user inputs
search_index_definition.name = INDEX_NAME
search_index_definition.source_name = CB_BUCKET_NAME
# Update types mapping
old_type_key = next(iter(search_index_definition.params['mapping']['types'].keys()))
type_obj = search_index_definition.params['mapping']['types'].pop(old_type_key)
search_index_definition.params['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj
search_index_name = search_index_definition.name
# Get scope-level search manager
scope_search_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()
# If index exists, copy its UUID so upsert can update it
try:
existing_index = scope_search_manager.get_index(search_index_name)
search_index_definition.uuid = existing_index.uuid
logging.info(f"Found existing search index '{search_index_name}', updating...")
except Exception:
logging.info(f"Search index '{search_index_name}' does not exist, creating...")
try:
scope_search_manager.upsert_index(search_index_definition)
logging.info(f"Search index '{search_index_name}' upserted successfully at scope level.")
except Exception as e:
raise RuntimeError(f"Failed to upsert search index '{search_index_name}': {str(e)}")2026-02-12 10:13:30,282 - INFO - Found existing search index 'vector_search_llamaindex', updating...
2026-02-12 10:13:30,300 - INFO - Search index 'vector_search_llamaindex' upserted successfully at scope level.To build a RAG engine, we need data to search through. We use the BBC Realtime News dataset, a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM.
The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries.
try:
news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train")
print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
except Exception as e:
raise ValueError(f"Error loading BBC News dataset: {str(e)}")Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
2026-02-12 10:13:30,808 - WARNING - Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Loaded the BBC News dataset with 2687 rows# Print the first two examples from the dataset
print("Dataset columns:", news_dataset.column_names)
print("\nFirst two examples:")
print(news_dataset[:2])Dataset columns: ['title', 'published_date', 'authors', 'description', 'section', 'content', 'link', 'top_image']
First two examples:
{'title': ["Pakistan protest: Bushra Bibi's march for Imran Khan disappeared - BBC News", 'Lockdown DIY linked to Walleys Quarry gases - BBC News'], 'published_date': ['2024-12-01', '2024-12-01'], 'authors': ['https://www.facebook.com/bbcnews', 'https://www.facebook.com/bbcnews'], 'description': ["Imran Khan's third wife guided protesters to the heart of the capital - and then disappeared.", 'An academic says an increase in plasterboard sent to landfill could be behind a spike in smells.'], 'section': ['Asia', 'Stoke & Staffordshire'], 'content': ['Bushra Bibi led a protest to free Imran Khan - what happened next is a mystery\n\nImran Khan\'s wife, Bushra Bibi, encouraged protesters into the heart of Pakistan\'s capital, Islamabad\n\nA charred lorry, empty tear gas shells and posters of former Pakistan Prime Minister Imran Khan - it was all that remained of a massive protest led by Khan’s wife, Bushra Bibi, that had sent the entire capital into lockdown. Just a day earlier, faith healer Bibi - wrapped in a white shawl, her face covered by a white veil - stood atop a shipping container on the edge of the city as thousands of her husband’s devoted followers waved flags and chanted slogans beneath her. It was the latest protest to flare since Khan, the 72-year-old cricketing icon-turned-politician, was jailed more than a year ago after falling foul of the country\'s influential military which helped catapult him to power. “My children and my brothers! You have to stand with me,” Bibi cried on Tuesday afternoon, her voice cutting through the deafening roar of the crowd. “But even if you don’t,” she continued, “I will still stand firm. “This is not just about my husband. It is about this country and its leader.” It was, noted some watchers of Pakistani politics, her political debut. But as the sun rose on Wednesday morning, there was no sign of Bibi, nor the thousands of protesters who had marched through the country to the heart of the capital, demanding the release of their jailed leader. While other PMs have fallen out with Pakistan\'s military in the past, Khan\'s refusal to stay quiet behind bars is presenting an extraordinary challenge - escalating the standoff and leaving the country deeply divided. Exactly what happened to the so-called “final march”, and Bibi, when the city went dark is still unclear. All eyewitnesses like Samia* can say for certain is that the lights went out suddenly, plunging D Chowk, the square where they had gathered, into blackness.\n\nWithin a day of arriving, the protesters had scattered - leaving behind Bibi\'s burnt-out vehicle\n\nAs loud screams and clouds of tear gas blanketed the square, Samia describes holding her husband on the pavement, bloodied from a gun shot to his shoulder. "Everyone was running for their lives," she later told BBC Urdu from a hospital in Islamabad, adding it was "like doomsday or a war". "His blood was on my hands and the screams were unending.” But how did the tide turn so suddenly and decisively? Just hours earlier, protesters finally reached D Chowk late afternoon on Tuesday. They had overcome days of tear gas shelling and a maze of barricaded roads to get to the city centre. Many of them were supporters and workers of the Pakistan Tehreek-e-Insaf (PTI), the party led by Khan. He had called for the march from his jail cell, where he has been for more than a year on charges he says are politically motivated. Now Bibi - his third wife, a woman who had been largely shrouded in mystery and out of public view since their unexpected wedding in 2018 - was leading the charge. “We won’t go back until we have Khan with us,” she declared as the march reached D Chowk, deep in the heart of Islamabad’s government district.\n\nThousands had marched for days to reach Islamabad, demanding former Prime Minister Imran Khan be released from jail\n\nInsiders say even the choice of destination - a place where her husband had once led a successful sit in - was Bibi’s, made in the face of other party leader’s opposition, and appeals from the government to choose another gathering point. Her being at the forefront may have come as a surprise. Bibi, only recently released from prison herself, is often described as private and apolitical. Little is known about her early life, apart from the fact she was a spiritual guide long before she met Khan. Her teachings, rooted in Sufi traditions, attracted many followers - including Khan himself. Was she making her move into politics - or was her sudden appearance in the thick of it a tactical move to keep Imran Khan’s party afloat while he remains behind bars? For critics, it was a move that clashed with Imran Khan’s oft-stated opposition to dynastic politics. There wasn’t long to mull the possibilities. After the lights went out, witnesses say that police started firing fresh rounds of tear gas at around 21:30 local time (16:30 GMT). The crackdown was in full swing just over an hour later. At some point, amid the chaos, Bushra Bibi left. Videos on social media appeared to show her switching cars and leaving the scene. The BBC couldn’t verify the footage. By the time the dust settled, her container had already been set on fire by unknown individuals. By 01:00 authorities said all the protesters had fled.\n\nSecurity was tight in the city, and as night fell, lights were switched off - leaving many in the dark as to what exactly happened next\n\nEyewitnesses have described scenes of chaos, with tear gas fired and police rounding up protesters. One, Amin Khan, said from behind an oxygen mask that he joined the march knowing that, "either I will bring back Imran Khan or I will be shot". The authorities have have denied firing at the protesters. They also said some of the protesters were carrying firearms. The BBC has seen hospital records recording patients with gunshot injuries. However, government spokesperson Attaullah Tarar told the BBC that hospitals had denied receiving or treating gunshot wound victims. He added that "all security personnel deployed on the ground have been forbidden" from having live ammunition during protests. But one doctor told BBC Urdu that he had never done so many surgeries for gunshot wounds in a single night. "Some of the injured came in such critical condition that we had to start surgery right away instead of waiting for anaesthesia," he said. While there has been no official toll released, the BBC has confirmed with local hospitals that at least five people have died. Police say at least 500 protesters were arrested that night and are being held in police stations. The PTI claims some people are missing. And one person in particular hasn’t been seen in days: Bushra Bibi.\n\nThe next morning, the protesters were gone - leaving behind just wrecked cars and smashed glass\n\nOthers defended her. “It wasn’t her fault,” insisted another. “She was forced to leave by the party leaders.” Political commentators have been more scathing. “Her exit damaged her political career before it even started,” said Mehmal Sarfraz, a journalist and analyst. But was that even what she wanted? Khan has previously dismissed any thought his wife might have her own political ambitions - “she only conveys my messages,” he said in a statement attributed to him on his X account.\n\nImran Khan and Bushra Bibi, pictured here arriving at court in May 2023, married in 2018\n\nSpeaking to BBC Urdu, analyst Imtiaz Gul calls her participation “an extraordinary step in extraordinary circumstances". Gul believes Bushra Bibi’s role today is only about “keeping the party and its workers active during Imran Khan’s absence”. It is a feeling echoed by some PTI members, who believe she is “stepping in only because Khan trusts her deeply”. Insiders, though, had often whispered that she was pulling the strings behind the scenes - advising her husband on political appointments and guiding high-stakes decisions during his tenure. A more direct intervention came for the first time earlier this month, when she urged a meeting of PTI leaders to back Khan’s call for a rally. Pakistan’s defence minister Khawaja Asif accused her of “opportunism”, claiming she sees “a future for herself as a political leader”. But Asma Faiz, an associate professor of political science at Lahore University of Management Sciences, suspects the PTI’s leadership may have simply underestimated Bibi. “It was assumed that there was an understanding that she is a non-political person, hence she will not be a threat,” she told the AFP news agency. “However, the events of the last few days have shown a different side of Bushra Bibi.” But it probably doesn’t matter what analysts and politicians think. Many PTI supporters still see her as their connection to Imran Khan. It was clear her presence was enough to electrify the base. “She is the one who truly wants to get him out,” says Asim Ali, a resident of Islamabad. “I trust her. Absolutely!”', 'Walleys Quarry was ordered not to accept any new waste as of Friday\n\nA chemist and former senior lecturer in environmental sustainability has said powerful odours from a controversial landfill site may be linked to people doing more DIY during the Covid-19 pandemic. Complaints about Walleys Quarry in Silverdale, Staffordshire – which was ordered to close as of Friday – increased significantly during and after coronavirus lockdowns. Issuing the closure notice, the Environment Agency described management of the site as poor, adding it had exhausted all other enforcement tactics at premises where gases had been noxious and periodically above emission level guidelines - which some campaigners linked to ill health locally. Dr Sharon George, who used to teach at Keele University, said she had been to the site with students and found it to be clean and well-managed, and suggested an increase in plasterboard heading to landfills in 2020 could be behind a spike in stenches.\n\n“One of the materials that is particularly bad for producing odours and awful emissions is plasterboard," she said. “That’s one of the theories behind why Walleys Quarry got worse at that time.” She said the landfill was in a low-lying area, and that some of the gases that came from the site were quite heavy. “They react with water in the atmosphere, so some of the gases you smell can be quite awful and not very good for our health. “It’s why, on some days when it’s colder and muggy and a bit misty, you can smell it more.” Dr George added: “With any landfill, you’re putting things into the ground – and when you put things into the ground, if they can they will start to rot. When they start to rot they’re going to give off gases.” She believed Walleys Quarry’s proximity to people’s homes was another major factor in the amount of complaints that arose from its operation. “If you’ve got a gas that people can smell, they’re going to report it much more than perhaps a pollutant that might go unnoticed.”\n\nRebecca Currie said she did not think the site would ever be closed\n\nLocal resident and campaigner Rebecca Currie said the closure notice served to Walleys Quarry was "absolutely amazing". Her son Matthew has had breathing difficulties after being born prematurely with chronic lung disease, and Ms Currie says the site has made his symptoms worse. “I never thought this day was going to happen,” she explained. “We fought and fought for years.” She told BBC Midlands Today: “Our community have suffered. We\'ve got kids who are really poorly, people have moved homes.”\n\nComplaints about Walleys Quarry to Newcastle-under-Lyme Borough Council exceeded 700 in November, the highest amount since 2021 according to council leader Simon Tagg. The Environment Agency (EA), which is responsible for regulating landfill sites, said it had concluded further operation at the site could result in "significant long-term pollution". A spokesperson for Walley\'s Quarry Ltd said the firm rejected the EA\'s accusations of poor management, and would be challenging the closure notice. Dr George said she believed the EA was likely to be erring on the side of caution and public safety, adding safety standards were strict. She said a lack of landfill space in the country overall was one of the broader issues that needed addressing. “As people, we just keep using stuff and then have nowhere to put it, and then when we end up putting it in places like Walleys Quarry that is next to houses, I think that’s where the problems are.”\n\nTell us which stories we should cover in Staffordshire'], 'link': ['http://www.bbc.co.uk/news/articles/cvg02lvj1e7o', 'http://www.bbc.co.uk/news/articles/c5yg1v16nkpo'], 'top_image': ['https://ichef.bbci.co.uk/ace/standard/3840/cpsprodpb/9975/live/b22229e0-ad5a-11ef-83bc-1153ed943d1c.jpg', 'https://ichef.bbci.co.uk/ace/standard/3840/cpsprodpb/0896/live/55209f80-adb2-11ef-8f6c-f1a86bb055ec.jpg']}We need to extract the context passages from the dataset to use as our knowledge base for the RAG system.
try:
news_articles = news_dataset
unique_articles = {}
for article in news_articles:
content = article.get("content")
if content:
content_hash = hashlib.md5(content.encode()).hexdigest()
if content_hash not in unique_articles:
unique_articles[content_hash] = article
unique_news_articles = list(unique_articles.values())
logging.info(f"We have {len(unique_news_articles)} unique articles in our database.")
except Exception as e:
raise RuntimeError(f"Failed to prepare data: {str(e)}")2026-02-12 10:13:34,194 - INFO - We have 1749 unique articles in our database.Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's text-embedding-3-large model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing.
try:
# Set up the embedding model
embed_model = OpenAIEmbedding(
api_key=OPENAI_API_KEY,
embed_batch_size=30,
model="text-embedding-3-large"
)
# Configure LlamaIndex to use this embedding model
Settings.embed_model = embed_model
print("Successfully created embedding model")
except Exception as e:
raise ValueError(f"Error creating embedding model: {str(e)}")Successfully created embedding modelWe can test the embeddings model by generating an embedding for a string
try:
test_embedding = embed_model.get_text_embedding("this is a test sentence")
logging.info(f"Embedding dimension: {len(test_embedding)}")
except Exception as e:
raise RuntimeError(f"Failed to generate test embedding: {str(e)}")2026-02-12 10:13:34,683 - INFO - Embedding dimension: 3072The vector store is set up to store the documents from the dataset. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors.
try:
# Create the Couchbase vector store
vector_store = CouchbaseSearchVectorStore(
cluster=cluster,
bucket_name=CB_BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
index_name=INDEX_NAME,
)
print("Successfully created vector store")
except Exception as e:
raise ValueError(f"Failed to create vector store: {str(e)}")Successfully created vector storeIn this section, we'll process our news articles and create LlamaIndex Document objects. Each Document is created with specific metadata and formatting templates to control what the LLM and embedding model see. We'll observe examples of the formatted content to understand how the documents are structured.
llama_documents = []
# Process and store documents
for article in unique_news_articles:
try:
document = Document(
text=article["content"],
metadata={
"title": article["title"],
"description": article["description"],
"published_date": article["published_date"],
"link": article["link"],
},
excluded_llm_metadata_keys=["description"],
excluded_embed_metadata_keys=["description", "published_date", "link"],
metadata_template="{key}=>{value}",
text_template="Metadata: \n{metadata_str}\n-----\nContent: {content}",
)
llama_documents.append(document)
except Exception as e:
print(f"Failed to save document to vector store: {str(e)}")
continue
# Observing an example of what the LLM and Embedding model receive as input
print("The LLM sees this:")
print(llama_documents[0].get_content(metadata_mode=MetadataMode.LLM))
print("The Embedding model sees this:")
print(llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED))The LLM sees this:
Metadata:
title=>Pakistan protest: Bushra Bibi's march for Imran Khan disappeared - BBC News
published_date=>2024-12-01
link=>http://www.bbc.co.uk/news/articles/cvg02lvj1e7o
-----
Content: Bushra Bibi led a protest to free Imran Khan - what happened next is a mystery
Imran Khan's wife, Bushra Bibi, encouraged protesters into the heart of Pakistan's capital, Islamabad
A charred lorry, empty tear gas shells and posters of former Pakistan Prime Minister Imran Khan - it was all that remained of a massive protest led by Khan’s wife, Bushra Bibi, that had sent the entire capital into lockdown. Just a day earlier, faith healer Bibi - wrapped in a white shawl, her face covered by a white veil - stood atop a shipping container on the edge of the city as thousands of her husband’s devoted followers waved flags and chanted slogans beneath her. It was the latest protest to flare since Khan, the 72-year-old cricketing icon-turned-politician, was jailed more than a year ago after falling foul of the country's influential military which helped catapult him to power. “My children and my brothers! You have to stand with me,” Bibi cried on Tuesday afternoon, her voice cutting through the deafening roar of the crowd. “But even if you don’t,” she continued, “I will still stand firm. “This is not just about my husband. It is about this country and its leader.” It was, noted some watchers of Pakistani politics, her political debut. But as the sun rose on Wednesday morning, there was no sign of Bibi, nor the thousands of protesters who had marched through the country to the heart of the capital, demanding the release of their jailed leader. While other PMs have fallen out with Pakistan's military in the past, Khan's refusal to stay quiet behind bars is presenting an extraordinary challenge - escalating the standoff and leaving the country deeply divided. Exactly what happened to the so-called “final march”, and Bibi, when the city went dark is still unclear. All eyewitnesses like Samia* can say for certain is that the lights went out suddenly, plunging D Chowk, the square where they had gathered, into blackness.
Within a day of arriving, the protesters had scattered - leaving behind Bibi's burnt-out vehicle
As loud screams and clouds of tear gas blanketed the square, Samia describes holding her husband on the pavement, bloodied from a gun shot to his shoulder. "Everyone was running for their lives," she later told BBC Urdu from a hospital in Islamabad, adding it was "like doomsday or a war". "His blood was on my hands and the screams were unending.” But how did the tide turn so suddenly and decisively? Just hours earlier, protesters finally reached D Chowk late afternoon on Tuesday. They had overcome days of tear gas shelling and a maze of barricaded roads to get to the city centre. Many of them were supporters and workers of the Pakistan Tehreek-e-Insaf (PTI), the party led by Khan. He had called for the march from his jail cell, where he has been for more than a year on charges he says are politically motivated. Now Bibi - his third wife, a woman who had been largely shrouded in mystery and out of public view since their unexpected wedding in 2018 - was leading the charge. “We won’t go back until we have Khan with us,” she declared as the march reached D Chowk, deep in the heart of Islamabad’s government district.
Thousands had marched for days to reach Islamabad, demanding former Prime Minister Imran Khan be released from jail
Insiders say even the choice of destination - a place where her husband had once led a successful sit in - was Bibi’s, made in the face of other party leader’s opposition, and appeals from the government to choose another gathering point. Her being at the forefront may have come as a surprise. Bibi, only recently released from prison herself, is often described as private and apolitical. Little is known about her early life, apart from the fact she was a spiritual guide long before she met Khan. Her teachings, rooted in Sufi traditions, attracted many followers - including Khan himself. Was she making her move into politics - or was her sudden appearance in the thick of it a tactical move to keep Imran Khan’s party afloat while he remains behind bars? For critics, it was a move that clashed with Imran Khan’s oft-stated opposition to dynastic politics. There wasn’t long to mull the possibilities. After the lights went out, witnesses say that police started firing fresh rounds of tear gas at around 21:30 local time (16:30 GMT). The crackdown was in full swing just over an hour later. At some point, amid the chaos, Bushra Bibi left. Videos on social media appeared to show her switching cars and leaving the scene. The BBC couldn’t verify the footage. By the time the dust settled, her container had already been set on fire by unknown individuals. By 01:00 authorities said all the protesters had fled.
Security was tight in the city, and as night fell, lights were switched off - leaving many in the dark as to what exactly happened next
Eyewitnesses have described scenes of chaos, with tear gas fired and police rounding up protesters. One, Amin Khan, said from behind an oxygen mask that he joined the march knowing that, "either I will bring back Imran Khan or I will be shot". The authorities have have denied firing at the protesters. They also said some of the protesters were carrying firearms. The BBC has seen hospital records recording patients with gunshot injuries. However, government spokesperson Attaullah Tarar told the BBC that hospitals had denied receiving or treating gunshot wound victims. He added that "all security personnel deployed on the ground have been forbidden" from having live ammunition during protests. But one doctor told BBC Urdu that he had never done so many surgeries for gunshot wounds in a single night. "Some of the injured came in such critical condition that we had to start surgery right away instead of waiting for anaesthesia," he said. While there has been no official toll released, the BBC has confirmed with local hospitals that at least five people have died. Police say at least 500 protesters were arrested that night and are being held in police stations. The PTI claims some people are missing. And one person in particular hasn’t been seen in days: Bushra Bibi.
The next morning, the protesters were gone - leaving behind just wrecked cars and smashed glass
Others defended her. “It wasn’t her fault,” insisted another. “She was forced to leave by the party leaders.” Political commentators have been more scathing. “Her exit damaged her political career before it even started,” said Mehmal Sarfraz, a journalist and analyst. But was that even what she wanted? Khan has previously dismissed any thought his wife might have her own political ambitions - “she only conveys my messages,” he said in a statement attributed to him on his X account.
Imran Khan and Bushra Bibi, pictured here arriving at court in May 2023, married in 2018
Speaking to BBC Urdu, analyst Imtiaz Gul calls her participation “an extraordinary step in extraordinary circumstances". Gul believes Bushra Bibi’s role today is only about “keeping the party and its workers active during Imran Khan’s absence”. It is a feeling echoed by some PTI members, who believe she is “stepping in only because Khan trusts her deeply”. Insiders, though, had often whispered that she was pulling the strings behind the scenes - advising her husband on political appointments and guiding high-stakes decisions during his tenure. A more direct intervention came for the first time earlier this month, when she urged a meeting of PTI leaders to back Khan’s call for a rally. Pakistan’s defence minister Khawaja Asif accused her of “opportunism”, claiming she sees “a future for herself as a political leader”. But Asma Faiz, an associate professor of political science at Lahore University of Management Sciences, suspects the PTI’s leadership may have simply underestimated Bibi. “It was assumed that there was an understanding that she is a non-political person, hence she will not be a threat,” she told the AFP news agency. “However, the events of the last few days have shown a different side of Bushra Bibi.” But it probably doesn’t matter what analysts and politicians think. Many PTI supporters still see her as their connection to Imran Khan. It was clear her presence was enough to electrify the base. “She is the one who truly wants to get him out,” says Asim Ali, a resident of Islamabad. “I trust her. Absolutely!”
The Embedding model sees this:
Metadata:
title=>Pakistan protest: Bushra Bibi's march for Imran Khan disappeared - BBC News
-----
Content: Bushra Bibi led a protest to free Imran Khan - what happened next is a mystery
Imran Khan's wife, Bushra Bibi, encouraged protesters into the heart of Pakistan's capital, Islamabad
A charred lorry, empty tear gas shells and posters of former Pakistan Prime Minister Imran Khan - it was all that remained of a massive protest led by Khan’s wife, Bushra Bibi, that had sent the entire capital into lockdown. Just a day earlier, faith healer Bibi - wrapped in a white shawl, her face covered by a white veil - stood atop a shipping container on the edge of the city as thousands of her husband’s devoted followers waved flags and chanted slogans beneath her. It was the latest protest to flare since Khan, the 72-year-old cricketing icon-turned-politician, was jailed more than a year ago after falling foul of the country's influential military which helped catapult him to power. “My children and my brothers! You have to stand with me,” Bibi cried on Tuesday afternoon, her voice cutting through the deafening roar of the crowd. “But even if you don’t,” she continued, “I will still stand firm. “This is not just about my husband. It is about this country and its leader.” It was, noted some watchers of Pakistani politics, her political debut. But as the sun rose on Wednesday morning, there was no sign of Bibi, nor the thousands of protesters who had marched through the country to the heart of the capital, demanding the release of their jailed leader. While other PMs have fallen out with Pakistan's military in the past, Khan's refusal to stay quiet behind bars is presenting an extraordinary challenge - escalating the standoff and leaving the country deeply divided. Exactly what happened to the so-called “final march”, and Bibi, when the city went dark is still unclear. All eyewitnesses like Samia* can say for certain is that the lights went out suddenly, plunging D Chowk, the square where they had gathered, into blackness.
Within a day of arriving, the protesters had scattered - leaving behind Bibi's burnt-out vehicle
As loud screams and clouds of tear gas blanketed the square, Samia describes holding her husband on the pavement, bloodied from a gun shot to his shoulder. "Everyone was running for their lives," she later told BBC Urdu from a hospital in Islamabad, adding it was "like doomsday or a war". "His blood was on my hands and the screams were unending.” But how did the tide turn so suddenly and decisively? Just hours earlier, protesters finally reached D Chowk late afternoon on Tuesday. They had overcome days of tear gas shelling and a maze of barricaded roads to get to the city centre. Many of them were supporters and workers of the Pakistan Tehreek-e-Insaf (PTI), the party led by Khan. He had called for the march from his jail cell, where he has been for more than a year on charges he says are politically motivated. Now Bibi - his third wife, a woman who had been largely shrouded in mystery and out of public view since their unexpected wedding in 2018 - was leading the charge. “We won’t go back until we have Khan with us,” she declared as the march reached D Chowk, deep in the heart of Islamabad’s government district.
Thousands had marched for days to reach Islamabad, demanding former Prime Minister Imran Khan be released from jail
Insiders say even the choice of destination - a place where her husband had once led a successful sit in - was Bibi’s, made in the face of other party leader’s opposition, and appeals from the government to choose another gathering point. Her being at the forefront may have come as a surprise. Bibi, only recently released from prison herself, is often described as private and apolitical. Little is known about her early life, apart from the fact she was a spiritual guide long before she met Khan. Her teachings, rooted in Sufi traditions, attracted many followers - including Khan himself. Was she making her move into politics - or was her sudden appearance in the thick of it a tactical move to keep Imran Khan’s party afloat while he remains behind bars? For critics, it was a move that clashed with Imran Khan’s oft-stated opposition to dynastic politics. There wasn’t long to mull the possibilities. After the lights went out, witnesses say that police started firing fresh rounds of tear gas at around 21:30 local time (16:30 GMT). The crackdown was in full swing just over an hour later. At some point, amid the chaos, Bushra Bibi left. Videos on social media appeared to show her switching cars and leaving the scene. The BBC couldn’t verify the footage. By the time the dust settled, her container had already been set on fire by unknown individuals. By 01:00 authorities said all the protesters had fled.
Security was tight in the city, and as night fell, lights were switched off - leaving many in the dark as to what exactly happened next
Eyewitnesses have described scenes of chaos, with tear gas fired and police rounding up protesters. One, Amin Khan, said from behind an oxygen mask that he joined the march knowing that, "either I will bring back Imran Khan or I will be shot". The authorities have have denied firing at the protesters. They also said some of the protesters were carrying firearms. The BBC has seen hospital records recording patients with gunshot injuries. However, government spokesperson Attaullah Tarar told the BBC that hospitals had denied receiving or treating gunshot wound victims. He added that "all security personnel deployed on the ground have been forbidden" from having live ammunition during protests. But one doctor told BBC Urdu that he had never done so many surgeries for gunshot wounds in a single night. "Some of the injured came in such critical condition that we had to start surgery right away instead of waiting for anaesthesia," he said. While there has been no official toll released, the BBC has confirmed with local hospitals that at least five people have died. Police say at least 500 protesters were arrested that night and are being held in police stations. The PTI claims some people are missing. And one person in particular hasn’t been seen in days: Bushra Bibi.
The next morning, the protesters were gone - leaving behind just wrecked cars and smashed glass
Others defended her. “It wasn’t her fault,” insisted another. “She was forced to leave by the party leaders.” Political commentators have been more scathing. “Her exit damaged her political career before it even started,” said Mehmal Sarfraz, a journalist and analyst. But was that even what she wanted? Khan has previously dismissed any thought his wife might have her own political ambitions - “she only conveys my messages,” he said in a statement attributed to him on his X account.
Imran Khan and Bushra Bibi, pictured here arriving at court in May 2023, married in 2018
Speaking to BBC Urdu, analyst Imtiaz Gul calls her participation “an extraordinary step in extraordinary circumstances". Gul believes Bushra Bibi’s role today is only about “keeping the party and its workers active during Imran Khan’s absence”. It is a feeling echoed by some PTI members, who believe she is “stepping in only because Khan trusts her deeply”. Insiders, though, had often whispered that she was pulling the strings behind the scenes - advising her husband on political appointments and guiding high-stakes decisions during his tenure. A more direct intervention came for the first time earlier this month, when she urged a meeting of PTI leaders to back Khan’s call for a rally. Pakistan’s defence minister Khawaja Asif accused her of “opportunism”, claiming she sees “a future for herself as a political leader”. But Asma Faiz, an associate professor of political science at Lahore University of Management Sciences, suspects the PTI’s leadership may have simply underestimated Bibi. “It was assumed that there was an understanding that she is a non-political person, hence she will not be a threat,” she told the AFP news agency. “However, the events of the last few days have shown a different side of Bushra Bibi.” But it probably doesn’t matter what analysts and politicians think. Many PTI supporters still see her as their connection to Imran Khan. It was clear her presence was enough to electrify the base. “She is the one who truly wants to get him out,” says Asim Ali, a resident of Islamabad. “I trust her. Absolutely!”In this section, we'll create an ingestion pipeline to process our documents. The pipeline will:
This process transforms our raw documents into a searchable knowledge base that can be queried semantically.
try:
# Process documents: split into nodes, generate embeddings, and store in vector database
index_pipeline = IngestionPipeline(
transformations=[SentenceSplitter(), embed_model],
vector_store=vector_store,
)
nodes = index_pipeline.run(documents=llama_documents)
logging.info(f"Successfully ingested {len(nodes)} nodes into the vector store.")
except Exception as e:
raise RuntimeError(f"Failed to run ingestion pipeline: {str(e)}")2026-02-12 10:14:44,691 - INFO - Successfully ingested 2329 nodes into the vector store.Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's gpt-4o model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses.
The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user.
The LLM is configured using LlamaIndex's OpenAI-like provider with OpenAI's API endpoint and your OpenAI API key for seamless integration with their services.
try:
# Set up the LLM
llm = OpenAI(
api_key=OPENAI_API_KEY,
model="gpt-4o",
)
# Configure LlamaIndex to use this LLM
Settings.llm = llm
logging.info("Successfully created the OpenAI LLM")
except Exception as e:
raise ValueError(f"Error creating OpenAI LLM: {str(e)}")2026-02-12 10:14:44,697 - INFO - Successfully created the OpenAI LLMIn this section, we'll create a VectorStoreIndex from our Couchbase vector store. This index serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information.
The VectorStoreIndex provides a high-level interface to interact with our vector store, allowing us to:
try:
# Create your index
index = VectorStoreIndex.from_vector_store(vector_store)
rag = index.as_query_engine()
logging.info("Successfully created vector store index and query engine.")
except Exception as e:
raise RuntimeError(f"Failed to create vector store index: {str(e)}")2026-02-12 10:14:44,702 - INFO - Successfully created vector store index and query engine.Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will:
This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database.
# Sample query from the dataset
query = "What was Pep Guardiola's reaction to Manchester City's recent form?"
try:
# Perform the semantic search
start_time = time.time()
response = rag.query(query)
search_elapsed_time = time.time() - start_time
# Display search results
print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):")
print(response)
except Exception as e:
raise RuntimeError(f"Error performing semantic search: {e}")Semantic Search Results (completed in 2.72 seconds):
Pep Guardiola expressed that he is "fine" but admitted that Manchester City's recent poor form has affected his sleep and diet. He described his state of mind as "ugly" and mentioned that his sleep was "worse" and he was eating lighter due to digestion issues. Despite the challenges, he emphasized the need for the team to defend better and avoid mistakes.You've built a RAG system using Couchbase Search Vector Index with OpenAI and LlamaIndex. For the Hyperscale/Composite Vector Index alternative, see the query_based tutorial.