RAG with Cohere using Couchbase Hyperscale and Composite Vector Index

Learn how to build a semantic search engine using Couchbase and Cohere using Hyperscale and Composite Vector Index.
This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Cohere embeddings and language models.
You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase.

Introduction

In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and Cohere as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using Couchbase Hyperscale and Composite Vector Index from scratch. Alternatively if you want to perform semantic search using the Search Vector Index, please take a look at this.

How to run this tutorial

This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.

You can either download the notebook file and run it on Google Colab or run it on your system by setting up the Python environment.

Before you start

Get Credentials for Cohere

Please follow the instructions to generate the Cohere credentials.

Create and Deploy Your Free Tier Operational cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.

To learn more, please follow the instructions.

Note: To run this tutorial, you will need Capella with Couchbase Server version 8.0 or above as Hyperscale and Composite Vector Index search is supported only from version 8.0

Couchbase Capella Configuration

When running Couchbase using Capella, the following prerequisites need to be met.

Create the database credentials to access the required bucket (Read and Write) used in the application.
Allow access to the Cluster from the IP on which the application is running.

Setting the Stage: Installing Necessary Libraries

To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks.

%pip install --quiet datasets==3.5.0 langchain-couchbase==1.0.1 langchain-cohere==0.5.0 python-dotenv==1.1.1

[notice] A new release of pip is available: 25.0.1 -> 26.0
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

Importing Necessary Libraries

The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models.

import getpass
import json
import logging
import os
import time
from datetime import timedelta
from uuid import uuid4

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException,
                                  InternalServerFailureException,
                                  QueryIndexAlreadyExistsException,
                                  ServiceUnavailableException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_cohere import ChatCohere, CohereEmbeddings
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType

/Users/kaustavghosh/Desktop/vector-search-cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Setup Logging

Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script.

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Supress Excessive logging
logging.getLogger('openai').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)
logging.getLogger('langchain_cohere').setLevel(logging.ERROR)

Loading Sensitive Information

In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code.

load_dotenv()

COHERE_API_KEY = os.getenv('COHERE_API_KEY') or getpass.getpass('Enter your Cohere API key: ')
CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: cohere): ') or 'cohere'
CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache'

# Check if the variables are correctly loaded
if not COHERE_API_KEY:
    raise ValueError("COHERE_API_KEY is not provided and is required.")

Connecting to the Couchbase Cluster

Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.

try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2026-02-03 11:19:45,592 - INFO - Successfully connected to Couchbase

Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

Bucket Creation:
- Checks if specified bucket exists, creates it if not
- Sets bucket properties like RAM quota (1024MB) and replication (disabled)
- Note: You will not be able to create a bucket on Capella
Scope Management:
- Verifies if requested scope exists within bucket
- Creates new scope if needed (unless it's the default "_default" scope)
Collection Setup:
- Checks for collection existence within scope
- Creates collection if it doesn't exist
- Waits 2 seconds for collection to be ready

Additional Tasks:

Clears any existing documents for clean state
Implements comprehensive error handling and logging

The function is called twice to set up:

Main collection for vector embeddings
Cache collection for storing results

def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in scopes
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)

2026-02-03 11:19:45,604 - INFO - Bucket 'vector-search-testing' exists.
2026-02-03 11:19:45,609 - INFO - Collection 'cohere' already exists. Skipping creation.
2026-02-03 11:19:47,627 - INFO - All documents cleared from the collection.
2026-02-03 11:19:47,629 - INFO - Bucket 'vector-search-testing' exists.
2026-02-03 11:19:47,633 - INFO - Collection 'cache' already exists. Skipping creation.
2026-02-03 11:19:49,657 - INFO - All documents cleared from the collection.





<couchbase.collection.Collection at 0x12f351a60>

Creating Cohere Embeddings

Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using Cohere, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents.

try:
    embeddings = CohereEmbeddings(
        cohere_api_key=COHERE_API_KEY,
        model="embed-english-v3.0",
    )
    logging.info("Successfully created CohereEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating CohereEmbeddings: {str(e)}")

2026-02-03 11:19:52,039 - INFO - Successfully created CohereEmbeddings

Understanding Hyperscale and Composite Vector Search

Optimizing Vector Search with Hyperscale and Composite Vector Index

With Couchbase 8.0+, you can leverage the power of query-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. Hyperscale and Composite Vector Index search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors.

Hyperscale/Composite vs Search Vector Index: Choosing the Right Approach

Feature	Hyperscale/Composite Vector Index	Search Vector Index
Best For	Vector-first workloads, complex filtering, high QPS performance	Hybrid search and high recall rates
Couchbase Version	8.0.0+	7.6+
Filtering	Pre-filtering with `WHERE` clauses (Composite) or post-filtering (Hyperscale)	Pre-filtering with flexible ordering
Scalability	Up to billions of vectors (Hyperscale)	Up to 10 million vectors
Performance	Optimized for concurrent operations with low memory footprint	Good for mixed text and vector queries

Query-Based Vector Index Types

Couchbase offers two distinct query-based vector index types, each optimized for different use cases:

Hyperscale Vector Indexes

Best for: Pure vector searches like content discovery, recommendations, and semantic search
Use when: You primarily perform vector-only queries without complex scalar filtering
Features:
- High performance with low memory footprint
- Optimized for concurrent operations
- Designed to scale to billions of vectors
- Supports post-scan filtering for basic metadata filtering

Composite Vector Indexes

Best for: Filtered vector searches that combine vector similarity with scalar value filtering
Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data
Features:
- Efficient pre-filtering where scalar attributes reduce the vector comparison scope
- Best for well-defined workloads requiring complex filtering using Hyperscale and Composite Vector Index features
- Supports range lookups combined with vector search

Index Type Selection for This Tutorial

In this tutorial, we'll demonstrate creating a Hyperscale index and running vector similarity queries using Hyperscale and Composite Vector Index. Hyperscale is ideal for semantic search scenarios where you want:

High-performance vector search across large datasets
Low latency for real-time applications
Scalability to handle growing vector collections
Concurrent operations for multi-user environments

The Hyperscale index will provide optimal performance for our Cohere embedding-based semantic search implementation.

Alternative: Composite Vector Index

If your use case requires complex filtering with scalar attributes, you may want to consider using a Composite Vector Index instead:

## Alternative: Create a Composite index for filtered searches
vector_store.create_index(
    index_type=IndexType.COMPOSITE,
    index_description="IVF,SQ8",
    distance_metric=DistanceStrategy.COSINE,
    index_name="cohere_composite_index",
)

Use Composite indexes when:

You need to filter by document metadata or attributes before vector similarity
Your queries combine vector search with WHERE clauses
You have well-defined filtering requirements that can reduce the search space

Note: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments.

Understanding Index Configuration (Couchbase 8.0 Feature)

Before creating our Hyperscale index, it's important to understand the configuration parameters that optimize vector storage and search performance. The index_description parameter controls how Couchbase optimizes vector storage through centroids and quantization.

Index Description Format: `'IVF[<centroids>],{PQ|SQ}<settings>'`

Centroids (IVF - Inverted File)

Controls how the dataset is subdivided for faster searches
More centroids = faster search, slower training time
Fewer centroids = slower search, faster training time
If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

Quantization Options

Scalar Quantization (SQ):

SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
Lower memory usage, faster search, slightly reduced accuracy

Product Quantization (PQ):

Format: PQ<subquantizers>x<bits> (e.g., PQ32x8)
Better compression for very large datasets
More complex but can maintain accuracy with smaller index size

Common Configuration Examples

IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization
IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the Quantization & Centroid Settings.

For more information on query-based vector indexes, see Couchbase Vector Index Documentation.

Our Configuration Choice

In this tutorial, we use IVF,SQ8 which provides:

Auto-selected centroids optimized for our dataset size
8-bit scalar quantization for good balance of speed, memory usage, and accuracy
COSINE distance metric ideal for semantic similarity search
Optimal performance for most semantic search use cases

Setting Up the Couchbase Query Vector Store

A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the Couchbase Query Service converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used.

The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings.

try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding = embeddings,
        distance_metric=DistanceStrategy.COSINE
    )
    logging.info("Successfully created vector store")
except Exception as e:
    raise ValueError(f"Failed to create vector store: {str(e)}")

2026-02-03 11:19:52,055 - INFO - Successfully created vector store

Load the BBC News Dataset

To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2026-02-03 11:19:55,428 - INFO - Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows

Cleaning up the Data

We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.

Saving Data to the Vector Store

To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:

Memory Efficiency: Processing in smaller batches prevents memory overload
Progress Tracking: Easier to monitor and track the ingestion progress
Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including:

Document sizes being inserted
Available system resources
Network conditions
Concurrent workload

Consider measuring performance with your specific workload before adjusting.

batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2026-02-03 11:20:27,699 - INFO - Document ingestion completed successfully.

Create Language Model (LLM)

The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs.

try:
    llm = ChatCohere(
        cohere_api_key=COHERE_API_KEY,
        model="command-a-03-2025",
        temperature=0
    )
    logging.info("Successfully created Cohere LLM with model command")
except Exception as e:
    raise ValueError(f"Error creating Cohere LLM: {str(e)}")

2026-02-03 11:20:27,712 - INFO - Successfully created Cohere LLM with model command

Understanding Semantic Search in Couchbase

Semantic search goes beyond traditional keyword matching by understanding the meaning and context behind queries. Here's how it works in Couchbase:

How Semantic Search Works

Vector Embeddings: Documents and queries are converted into high-dimensional vectors using an embeddings model (in our case, Cohere's embed-english-v3.0)
Similarity Calculation: When a query is made, Couchbase compares the query vector against stored document vectors using the COSINE distance metric
Result Ranking: Documents are ranked by their vector distance (lower distance = more similar meaning)
Flexible Configuration: Different distance metrics (cosine, euclidean, dot product) and embedding models can be used based on your needs

The similarity_search_with_score method performs this entire process, returning documents along with their similarity scores. This enables you to find semantically related content even when exact keywords don't match.

Now let's see semantic search in action and measure its performance with different optimization strategies.

Vector Search Performance Testing

Now let's measure and compare the performance benefits of different optimization strategies. We'll conduct a comprehensive performance analysis across two phases:

Performance Testing Phases:

Phase 1 - Baseline Performance: Test vector search without Hyperscale indexes to establish baseline metrics
Phase 2 - Hyperscale-Optimized Search: Create Hyperscale index and measure performance improvements

Important Context:

Hyperscale performance benefits scale with dataset size and concurrent load
With our dataset (~1,700 articles), improvements may be modest
Production environments with millions of vectors show significant Hyperscale advantages
The combination of Hyperscale + LLM caching provides optimal RAG performance

Phase 1: Baseline Performance (No Hyperscale Index)

query = "What was manchester city manager pep guardiola's reaction to the team's current form?"

try:
    # Perform the semantic search
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    baseline_time = time.time() - start_time

    logging.info(f"Baseline search completed in {baseline_time:.2f} seconds")

    # Display search results
    print(f"\nBaseline Semantic Search Results (completed in {baseline_time:.2f} seconds):")
    print("-" * 80)
    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content[:200]}...")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

2026-02-03 11:20:28,490 - INFO - Baseline search completed in 0.77 seconds



Baseline Semantic Search Results (completed in 0.77 seconds):
--------------------------------------------------------------------------------
Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016

Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affecte...
--------------------------------------------------------------------------------
Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form

This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Man...
--------------------------------------------------------------------------------
Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City

Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past...
--------------------------------------------------------------------------------
Distance: 0.3837, Text: 'I am not good enough' - Guardiola faces daunting and major rebuild

This video can not be played To play this video you need to enable JavaScript in your browser. 'I am not good enough' - Guardiola s...
--------------------------------------------------------------------------------
Distance: 0.4270, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team.

The former Barcelona and Bayern Munich boss has won 15 major trophi...
--------------------------------------------------------------------------------
Distance: 0.4493, Text: Man City might miss out on Champions League - Guardiola

Erling Haaland was part of the Manchester City side that won the Champions League for the first time in 2023

Manchester City boss Pep Guardiol...
--------------------------------------------------------------------------------
Distance: 0.4543, Text: 'Life is not easy' - Haaland penalty miss sums up Man City crisis

Manchester City striker Erling Haaland has now missed two of his 17 penalties taken in the Premier League

Nothing seems to be going ...
--------------------------------------------------------------------------------
Distance: 0.4814, Text: 'So happy he is back' - 'integral' De Bruyne 'one of best we've seen'

This video can not be played To play this video you need to enable JavaScript in your browser. Match of the Day: How Kevin de Bru...
--------------------------------------------------------------------------------
Distance: 0.5038, Text: Liverpool boss Arne Slot says his Liverpool side "came close to perfection" in their win against Manchester City.

MATCH REPORT: Liverpool beat Man City to go nine points clear at top of Premier Leagu...
--------------------------------------------------------------------------------
Distance: 0.5332, Text: Man City's Dias ruled out for 'three or four weeks'

Ruben Dias has won 10 major trophies during his time at Manchester City

Manchester City have suffered a fresh injury blow with manager Pep Guardio...
--------------------------------------------------------------------------------

Creating the Hyperscale Index

Now that we understand the different index types and configuration options (covered in the "Understanding Hyperscale and Composite Vector Search" section above), let's create a Hyperscale index for our vector store. This method takes an index type (HYPERSCALE or COMPOSITE) and description parameter for optimization settings.

vector_store.create_index(index_type=IndexType.HYPERSCALE, index_name="cohere_hyperscale_index",index_description="IVF,SQ8")

Note: To create a COMPOSITE index, the below code can be used. Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but Hyperscale is more efficient for pure semantic search across news articles.

vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="cohere_composite_index", index_description="IVF,SQ8")

Phase 2: Hyperscale-Optimized Performance

query = "What was manchester city manager pep guardiola's reaction to the team's current form?"

try:
    # Perform the semantic search with Hyperscale index
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    hyperscale_time = time.time() - start_time

    logging.info(f"Hyperscale search completed in {hyperscale_time:.2f} seconds")

    # Display search results
    print(f"\nHyperscale Semantic Search Results (completed in {hyperscale_time:.2f} seconds):")
    print("-" * 80)
    for doc, score in search_results:
        print(f"Distance: {score:.4f}, Text: {doc.page_content[:200]}...")
        print("-" * 80)

except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")
except Exception as e:
    raise RuntimeError(f"Unexpected error: {str(e)}")

2026-02-03 11:20:32,771 - INFO - Hyperscale search completed in 0.32 seconds



Hyperscale Semantic Search Results (completed in 0.32 seconds):
--------------------------------------------------------------------------------
Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016

Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affecte...
--------------------------------------------------------------------------------
Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form

This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Man...
--------------------------------------------------------------------------------
Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City

Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past...
--------------------------------------------------------------------------------
Distance: 0.3837, Text: 'I am not good enough' - Guardiola faces daunting and major rebuild

This video can not be played To play this video you need to enable JavaScript in your browser. 'I am not good enough' - Guardiola s...
--------------------------------------------------------------------------------
Distance: 0.4270, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team.

The former Barcelona and Bayern Munich boss has won 15 major trophi...
--------------------------------------------------------------------------------
Distance: 0.4493, Text: Man City might miss out on Champions League - Guardiola

Erling Haaland was part of the Manchester City side that won the Champions League for the first time in 2023

Manchester City boss Pep Guardiol...
--------------------------------------------------------------------------------
Distance: 0.4543, Text: 'Life is not easy' - Haaland penalty miss sums up Man City crisis

Manchester City striker Erling Haaland has now missed two of his 17 penalties taken in the Premier League

Nothing seems to be going ...
--------------------------------------------------------------------------------
Distance: 0.4814, Text: 'So happy he is back' - 'integral' De Bruyne 'one of best we've seen'

This video can not be played To play this video you need to enable JavaScript in your browser. Match of the Day: How Kevin de Bru...
--------------------------------------------------------------------------------
Distance: 0.5038, Text: Liverpool boss Arne Slot says his Liverpool side "came close to perfection" in their win against Manchester City.

MATCH REPORT: Liverpool beat Man City to go nine points clear at top of Premier Leagu...
--------------------------------------------------------------------------------
Distance: 0.5332, Text: Man City's Dias ruled out for 'three or four weeks'

Ruben Dias has won 10 major trophies during his time at Manchester City

Manchester City have suffered a fresh injury blow with manager Pep Guardio...
--------------------------------------------------------------------------------

Performance Analysis Summary

Let's analyze the performance improvements we've achieved through different optimization strategies:

print("\n" + "="*60)
print("PERFORMANCE SUMMARY")
print("="*60)

print(f"Baseline Search Time:     {baseline_time:.4f} seconds")

if baseline_time and hyperscale_time:
    speedup = baseline_time / hyperscale_time if hyperscale_time > 0 else float('inf')
    percent_improvement = ((baseline_time - hyperscale_time) / baseline_time) * 100 if baseline_time > 0 else 0
    print(f"Hyperscale Search Time:   {hyperscale_time:.4f} seconds ({speedup:.2f}x faster, {percent_improvement:.1f}% improvement)")

print("\n" + "-"*60)
print("Index Recommendation:")
print("-"*60)
print("- Hyperscale: Best for pure vector searches, scales to billions of vectors")
print("- Composite: Best for filtered searches combining vector + scalar filters")

============================================================
PERFORMANCE SUMMARY
============================================================
Baseline Search Time:     0.7732 seconds
Hyperscale Search Time:   0.3220 seconds (2.40x faster, 58.4% improvement)

------------------------------------------------------------
Index Recommendation:
------------------------------------------------------------
- Hyperscale: Best for pure vector searches, scales to billions of vectors
- Composite: Best for filtered searches combining vector + scalar filters

Set Up LLM Response Cache

A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries.

try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

2026-02-03 11:20:32,784 - INFO - Successfully created cache

Retrieval-Augmented Generation (RAG) with Couchbase and Langchain

Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain.

The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation.

try:
    template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
    {context}

    Question: {question}"""
    prompt = ChatPromptTemplate.from_template(template)

    rag_chain = (
        {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )
    logging.info("Successfully created RAG chain")
except Exception as e:
    raise ValueError(f"Error creating RAG chain: {str(e)}")

2026-02-03 11:20:32,788 - INFO - Successfully created RAG chain

start_time = time.time()
try:
    rag_response = rag_chain.invoke(query)
    rag_elapsed_time = time.time() - start_time
    print(f"RAG Response: {rag_response}")
    print(f"RAG response generated in {rag_elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

RAG Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent poor form. He has acknowledged the impact of the team's struggles on his personal well-being, stating that his sleep and diet have been affected. Guardiola described his state of mind as "ugly" and admitted to feeling more uncomfortable when the team is not performing well.

In response to the team's decline, Guardiola has emphasized the need for better defense and avoiding mistakes at both ends of the pitch. He has also highlighted the importance of bringing back injured players and finding a way to return to winning ways. Despite the challenges, Guardiola remains committed to finding solutions and has expressed trust in his players' pride and desire to improve.

Guardiola's reaction to the team's form has been one of self-reflection and determination, as he works to address the issues and guide the team back to success.
RAG response generated in 5.05 seconds

Demonstrating Cache Benefits

Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key.

For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently.

try:
    queries = [
        "What happened in the match between Fullham and Liverpool?",
        "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query
        "What happened in the match between Fullham and Liverpool?", # Repeated query
    ]

    for i, query in enumerate(queries, 1):
        print(f"\nQuery {i}: {query}")
        start_time = time.time()
        response = rag_chain.invoke(query)
        elapsed_time = time.time() - start_time
        print(f"Response: {response}")
        print(f"Time taken: {elapsed_time:.2f} seconds")
except InternalServerFailureException as e:
    if "query request rejected" in str(e):
        print("Error: Search request was rejected due to rate limiting. Please try again later.")
    else:
        print(f"Internal server error occurred: {str(e)}")
except Exception as e:
    print(f"Unexpected error occurred: {str(e)}")

Query 1: What happened in the match between Fullham and Liverpool?
Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to earn a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity.
Time taken: 3.16 seconds

Query 2: What was manchester city manager pep guardiola's reaction to the team's current form?
Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent poor form. He has acknowledged the impact of the team's struggles on his personal well-being, stating that his sleep and diet have been affected. Guardiola described his state of mind as "ugly" and admitted to feeling more uncomfortable when the team is not performing well.

In response to the team's decline, Guardiola has emphasized the need for better defense and avoiding mistakes at both ends of the pitch. He has also highlighted the importance of bringing back injured players and finding a way to return to winning ways. Despite the challenges, Guardiola remains committed to finding solutions and has expressed trust in his players' pride and desire to improve.

Guardiola's reaction to the team's form has been one of self-reflection and determination, as he works to address the issues and guide the team back to success.
Time taken: 0.34 seconds

Query 3: What happened in the match between Fullham and Liverpool?
Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to earn a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity.
Time taken: 0.32 seconds

Conclusion

You've built a high-performance semantic search engine using Couchbase Hyperscale/Composite indexes with Cohere and LangChain. For the Search Vector Index alternative, see the search_based tutorial.

Contents