Retrieval-Augmented Generation (RAG) with Couchbase and Azure OpenAI using GSI index

Learn how to build a semantic search engine using Couchbase and Azure OpenAI using GSI.
This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Azure OpenAI embeddings.
You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase.

Introduction

In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, AzureOpenAI as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at this.

How to run this tutorial

This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.

You can either download the notebook file and run it on Google Colab or run it on your system by setting up the Python environment.

Before you start

Get Credentials for Azure OpenAI

Please follow the instructions to generate the Azure OpenAI credentials.

Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint.

To know more, please follow the instructions.

Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0

Couchbase Capella Configuration

When running Couchbase using Capella, the following prerequisites need to be met.

Create the database credentials to access the travel-sample bucket (Read and Write) used in the application.
Allow access to the Cluster from the IP on which the application is running.

Setting the Stage: Installing Necessary Libraries

To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and AzureOpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search.

!pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-openai==0.3.32

Importing Necessary Libraries

The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models.

import getpass
import json
import logging
import sys
import os
import time
from datetime import timedelta
from uuid import uuid4

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (
    CouchbaseException,
    InternalServerFailureException,
    QueryIndexAlreadyExistsException,
)
from couchbase.options import ClusterOptions
from datasets import load_dataset
from langchain_core.documents import Document
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_couchbase.vectorstores import IndexType
from tqdm import tqdm

Setup Logging

Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script.

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

# Suppress verbose HTTP request logging
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("openai").setLevel(logging.WARNING) 
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("azure").setLevel(logging.WARNING)

Loading Sensitive Information

In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.

AZURE_OPENAI_KEY = os.getenv('AZURE_OPENAI_KEY') or getpass.getpass('Enter your Azure OpenAI Key: ')
AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT') or input('Enter your Azure OpenAI Endpoint: ')
AZURE_OPENAI_EMBEDDING_DEPLOYMENT = os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT') or input('Enter your Azure OpenAI Embedding Deployment: ')
AZURE_OPENAI_CHAT_DEPLOYMENT = os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT') or input('Enter your Azure OpenAI Chat Deployment: ')

CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: azure): ') or 'azure'
CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache'

# Check if the variables are correctly loaded
if not all([AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_EMBEDDING_DEPLOYMENT, AZURE_OPENAI_CHAT_DEPLOYMENT]):
    raise ValueError("Missing required Azure OpenAI variables")

Connecting to the Couchbase Cluster

Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.

try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2025-09-22 12:23:15,245 - INFO - Successfully connected to Couchbase

Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

Bucket Creation:
- Checks if specified bucket exists, creates it if not
- Sets bucket properties like RAM quota (1024MB) and replication (disabled)
- Note: You will not be able to create a bucket on Capella
Scope Management:
- Verifies if requested scope exists within bucket
- Creates new scope if needed (unless it's the default "_default" scope)
Collection Setup:
- Checks for collection existence within scope
- Creates collection if it doesn't exist
- Waits 2 seconds for collection to be ready

Additional Tasks:

Clears any existing documents for clean state
Implements comprehensive error handling and logging

The function is called twice to set up:

Main collection for vector embeddings
Cache collection for storing results

def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)

2025-09-22 12:23:20,911 - INFO - Bucket 'query-vector-search-testing' exists.
2025-09-22 12:23:20,927 - INFO - Collection 'azure' already exists. Skipping creation.


2025-09-22 12:23:23,264 - INFO - All documents cleared from the collection.
2025-09-22 12:23:23,265 - INFO - Bucket 'query-vector-search-testing' exists.
2025-09-22 12:23:23,280 - INFO - Collection 'cache' already exists. Skipping creation.
2025-09-22 12:23:25,419 - INFO - All documents cleared from the collection.





<couchbase.collection.Collection at 0x15f303390>

Load the BBC News Dataset

To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-09-22 12:23:43,453 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows

Cleaning up the Data

We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.

Creating AzureOpenAI Embeddings

Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using AzureOpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents.

try:
    embeddings = AzureOpenAIEmbeddings(
        deployment=AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
        openai_api_key=AZURE_OPENAI_KEY,
        azure_endpoint=AZURE_OPENAI_ENDPOINT
    )
    logging.info("Successfully created AzureOpenAIEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating AzureOpenAIEmbeddings: {str(e)}")

2025-09-22 12:23:51,333 - INFO - Successfully created AzureOpenAIEmbeddings

Setting Up the Couchbase Query Vector Store

A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used.

The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings.

try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding = embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created vector store")
except Exception as e:
    raise ValueError(f"Failed to create vector store: {str(e)}")

2025-09-22 12:24:25,546 - INFO - Successfully created vector store

Saving Data to the Vector Store

To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:

Memory Efficiency: Processing in smaller batches prevents memory overload
Progress Tracking: Easier to monitor and track the ingestion progress
Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including:

Document sizes being inserted
Available system resources
Network conditions
Concurrent workload

Consider measuring performance with your specific workload before adjusting.

batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2025-09-22 12:36:18,756 - INFO - Document ingestion completed successfully.

Using the AzureChatOpenAI Language Model (LLM)

Language models are AI systems that are trained to understand and generate human language. We'll be using AzureChatOpenAI language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses.

The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user.

try:
    llm = AzureChatOpenAI(
        deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT,
        openai_api_key=AZURE_OPENAI_KEY,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
        openai_api_version="2024-10-21"
    )
    logging.info("Successfully created Azure OpenAI Chat model")
except Exception as e:
    raise ValueError(f"Error creating Azure OpenAI Chat model: {str(e)}")

2025-09-22 12:39:45,695 - INFO - Successfully created Azure OpenAI Chat model

Perform Semantic Search

Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself.

In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseQueryVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison.

query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    search_elapsed_time = time.time() - start_time

    logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds")

    # Display search results
    print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):")
    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content}")

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

2025-09-22 12:41:51,036 - INFO - Semantic search completed in 2.55 seconds



Semantic Search Results (completed in 2.55 seconds):
Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye

Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson.

One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers.

Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship.

As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful."

Littler beat Luke Humphries to win the Premier League title in May

The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m."

Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November.
• None Know a lot about Littler? Take our quiz
Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year

A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97.

Littler was hugged by his parents after victory over Meikle

Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award.

... (output truncated for brevity)

Optimizing Vector Search with Global Secondary Index (GSI)

While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase.

Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types:

Hyperscale Vector Indexes (BHIVE)

Best for pure vector searches - content discovery, recommendations, semantic search
High performance with low memory footprint - designed to scale to billions of vectors
Optimized for concurrent operations - supports simultaneous searches and inserts
Use when: You primarily perform vector-only queries without complex scalar filtering
Ideal for: Large-scale semantic search, recommendation systems, content discovery

Composite Vector Indexes

Best for filtered vector searches - combines vector search with scalar value filtering
Efficient pre-filtering - scalar attributes reduce the vector comparison scope
Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data
Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries

Choosing the Right Index Type

Start with Hyperscale Vector Index for pure vector searches and large datasets
Use Composite Vector Index when scalar filters significantly reduce your search space
Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions

For more details, see the Couchbase Vector Index documentation.

Understanding Index Configuration (Couchbase 8.0 Feature)

The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization:

Format: 'IVF[<centroids>],{PQ|SQ}<settings>'

Centroids (IVF - Inverted File):

Controls how the dataset is subdivided for faster searches
More centroids = faster search, slower training
Fewer centroids = slower search, faster training
If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

Quantization Options:

SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
PQ (Product Quantization): PQx (e.g., PQ32x8)
Higher values = better accuracy, larger index size

Common Examples:

IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization
IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the Quantization & Centroid Settings.

In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI.

vector_store.create_index(index_type=IndexType.BHIVE, index_name="azure_bhive_index",index_description="IVF,SQ8")

The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data.

Important: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria.

Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity.

query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    search_elapsed_time = time.time() - start_time

    logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds")

    # Display search results
    print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):")
    print("-" * 80)

    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content}")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

2025-09-22 12:42:10,244 - INFO - Semantic search completed in 1.30 seconds



Semantic Search Results (completed in 1.30 seconds):
--------------------------------------------------------------------------------
Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye

Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson.

One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers.

Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship.

As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful."

Littler beat Luke Humphries to win the Premier League title in May

The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m."

Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November.
• None Know a lot about Littler? Take our quiz
--------------------------------------------------------------------------------
Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year

A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97.

Littler was hugged by his parents after victory over Meikle

... (output truncated for brevity)

Note: To create a COMPOSITE index, the below code can be used. Choose based on your specific use case and query patterns. For this tutorial's question-answering scenario using the TREC dataset, either index type would work, but BHIVE might be more efficient for pure semantic search across questions.

vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="azure_composite_index", index_description="IVF,SQ8")

Setting Up a Couchbase Cache

To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly.

Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience.

try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

2025-09-22 12:42:21,917 - INFO - Successfully created cache

Retrieval-Augmented Generation (RAG) with Couchbase and Langchain

Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain.

The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation.

# Create RAG prompt template
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that answers questions based on the provided context."),
    ("human", "Context: {context}\n\nQuestion: {question}")
])

# Create RAG chain
rag_chain = (
    {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)
logging.info("Successfully created RAG chain")

2025-09-16 13:41:05,596 - INFO - Successfully created RAG chain

start_time = time.time()
# Turn off excessive Logging 
logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

try:
    rag_response = rag_chain.invoke(query)
    rag_elapsed_time = time.time() - start_time
    print(f"RAG Response: {rag_response}")
    print(f"RAG response generated in {rag_elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

RAG Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records:

1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle.

2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12.

3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old.

4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts.

5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85.

These achievements highlight Littler's exceptional talent and his continued rise in professional darts.
RAG response generated in 5.81 seconds

Using Couchbase as a caching mechanism

Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key.

For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently.

try:
    queries = [
        "What happened in the match between Fullham and Liverpool?",
        "What were Luke Littler's key achievements and records in his recent PDC World Championship match?",
        "What happened in the match between Fullham and Liverpool?", # Repeated query
    ]

    for i, query in enumerate(queries, 1):
        print(f"\nQuery {i}: {query}")
        start_time = time.time()

        response = rag_chain.invoke(query)
        elapsed_time = time.time() - start_time
        print(f"Response: {response}")
        print(f"Time taken: {elapsed_time:.2f} seconds")

except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

Query 1: What happened in the match between Fullham and Liverpool?
Response: In the Premier League match between Fulham and Liverpool, the game ended in a 2-2 draw at Anfield. Liverpool played the majority of the game with ten men after Andy Robertson was shown a red card in the 17th minute for denying Harry Wilson a goalscoring opportunity. Despite their numerical disadvantage, Liverpool demonstrated resilience and strong performance.

Fulham took the lead twice during the match, but Liverpool managed to equalize on both occasions. Diogo Jota, returning from injury, scored the crucial 86th-minute equalizer for Liverpool. Even with 10 players, Liverpool maintained over 60% possession and led various attacking metrics, including shots, big chances, and touches in the opposition box. 

Fulham's left-back Antonee Robinson praised Liverpool’s performance, stating that it didn’t feel like they had 10 men on the field due to their attacking risks and relentless pressure. Liverpool head coach Arne Slot called his team's performance "impressive" and lauded their character and fight in adversity.
Time taken: 6.69 seconds

Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match?
Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records:

1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle.

2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12.

3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old.

4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts.

5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85.

These achievements highlight Littler's exceptional talent and his continued rise in professional darts.
Time taken: 1.09 seconds


... (output truncated for brevity)

By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AzureOpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine.

Contents