RAG with CrewAI using Couchbase Hyperscale and Composite Vector Index

Learn to build a semantic search engine using [Couchbase](https://www.couchbase.com) and agent-based RAG workflows powered by [CrewAI](https://github.com/crewAIInc/crewAI).
Explore Couchbase Hyperscale and Composite Vector Indexes for high-performance vector search, including pure vector and filtered similarity queries.
Follow beginner-friendly, step-by-step instructions to build a fully functional semantic search system from scratch.

Introduction

In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and CrewAI for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial uses Couchbase's Hyperscale or Composite Index vector search capabilities, which offer high-performance vector search optimized for large-scale applications. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the Search Vector Index, please take a look at this.

How to Run This Tutorial

This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.

You can either:

Download the notebook file and run it on Google Colab
Run it on your system by setting up the Python environment

Prerequisites

Couchbase Requirements

Create and Deploy Your Free Tier Operational cluster on Capella
- To get started with Couchbase Capella, create an account and use it to deploy a free tier operational cluster
- This account provides you with an environment where you can explore and learn about Capella
- To learn more, please follow the Getting Started Guide
- Important: This tutorial requires Couchbase Server 8.0+ for Hyperscale and Composite vector index capabilities

Couchbase Capella Configuration

When running Couchbase using Capella, the following prerequisites need to be met:

Create the database credentials to access the required bucket (Read and Write) used in the application
Allow access to the Cluster from the IP on which the application is running by following the Network Security documentation

Setup and Installation

Installing Necessary Libraries

We'll install the following key libraries:

datasets: For loading and managing our training data
langchain-couchbase: To integrate Couchbase with LangChain for Hyperscale and Composite vector storage and caching
langchain-openai: For accessing OpenAI's embedding and chat models
crewai: To create and orchestrate our AI agents for RAG operations
python-dotenv: For securely managing environment variables and API keys

These libraries provide the foundation for building a semantic search engine with Hyperscale and Composite vector embeddings, database integration, and agent-based RAG capabilities.

%pip install --quiet datasets==4.1.0 langchain-couchbase==0.5.0 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1

Note: you may need to restart the kernel to use updated packages.

Import Required Modules

The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.

import getpass
import json
import logging
import os
import time
from datetime import timedelta
from uuid import uuid4

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.diagnostics import PingState, ServiceType
from couchbase.exceptions import (InternalServerFailureException,
                                  QueryIndexAlreadyExistsException,
                                  ServiceUnavailableException,
                                  CouchbaseException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from crewai.tools import tool
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy, IndexType
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from crewai import Agent, Crew, Process, Task

Configure Logging

Logging is configured to track the progress of the script and capture any errors or warnings.

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

# Suppress httpx logging
logging.getLogger('httpx').setLevel(logging.CRITICAL)

Load Environment Configuration

In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values.

# Load environment variables
load_dotenv("./.env")

# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ")
if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set")

CB_HOST = os.getenv('CB_HOST') or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or 'vector-search-testing'
SCOPE_NAME = os.getenv('SCOPE_NAME') or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or 'crew'

print("Configuration loaded successfully")

Configuration loaded successfully

Couchbase Connection Setup

Connect to Cluster

Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.

# Connect to Couchbase
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    print("Successfully connected to Couchbase")
except Exception as e:
    print(f"Failed to connect to Couchbase: {str(e)}")
    raise

Successfully connected to Couchbase

Setup Collections

Create and configure Couchbase bucket, scope, and collection for storing our vector data.

Bucket Creation:
- Checks if specified bucket exists, creates it if not
- Sets bucket properties like RAM quota (1024MB) and replication (disabled)
- Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties.
Scope Management:
- Verifies if requested scope exists within bucket
- Creates new scope if needed (unless it's the default "_default" scope)
Collection Setup:
- Checks for collection existence within scope
- Creates collection if it doesn't exist
- Waits 2 seconds for collection to be ready

Additional Tasks:

Clears any existing documents for clean state
Implements comprehensive error handling and logging

The function is called twice to set up:

Main collection for vector embeddings
Cache collection for storing results

def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            time.sleep(2)  # Wait for bucket creation to complete and become available
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)

2025-10-06 10:17:53 [INFO] Bucket 'vector-search-testing' exists.
2025-10-06 10:17:53 [INFO] Collection 'crew' already exists. Skipping creation.
2025-10-06 10:17:55 [INFO] All documents cleared from the collection.





<couchbase.collection.Collection at 0x307407a10>

Understanding Hyperscale and Composite Vector Indexes

Hyperscale and Composite Vector Index Configuration

Semantic search with Hyperscale and Composite Vector Indexes requires creating indexes optimized for vector operations. Unlike Search Vector Index-based vector search, Hyperscale and Composite vector indexes offer two distinct types optimized for different use cases. Learn more about these index types in the Couchbase Vector Index Documentation.

Vector Index Types

Hyperscale Vector Indexes

Best for: Pure vector searches like content discovery, recommendations, and semantic search
Performance: High performance with low memory footprint, optimized for concurrent operations
Scalability: Designed to scale to billions of vectors
Use when: You primarily perform vector-only queries without complex scalar filtering

Composite Vector Indexes

Best for: Filtered vector searches that combine vector search with scalar value filtering
Performance: Efficient pre-filtering where scalar attributes reduce the vector comparison scope
Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data
Note: Scalar filters take precedence over vector similarity

Understanding Index Configuration

The index_description parameter controls how Couchbase optimizes vector storage and search through centroids and quantization:

Format: 'IVF[<centroids>],{PQ|SQ}<settings>'

Centroids (IVF - Inverted File):

Controls how the dataset is subdivided for faster searches
More centroids = faster search, slower training
Fewer centroids = slower search, faster training
If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

Quantization Options:

SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
PQ (Product Quantization): PQx (e.g., PQ32x8)
Higher values = better accuracy, larger index size

Common Examples:

IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization
IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the Quantization & Centroid Settings.

For more information on Hyperscale and Composite vector indexes, see Couchbase Vector Index Documentation.

# Hyperscale and Composite Vector Index Configuration
# Unlike Search Vector Index, Hyperscale and Composite vector indexes are created programmatically through the vector store
# We'll configure the parameters that will be used for index creation

# Vector configuration
DISTANCE_STRATEGY = DistanceStrategy.COSINE  # Cosine similarity
INDEX_TYPE = IndexType.HYPERSCALE  # Using HYPERSCALE for high-performance vector search
INDEX_DESCRIPTION = "IVF,SQ8"  # Auto-selected centroids with 8-bit scalar quantization

# To create a Composite Index instead, use the following:
# INDEX_TYPE = IndexType.COMPOSITE  # Combines vector search with scalar filtering

print("Hyperscale and Composite vector index configuration prepared")

Hyperscale and Composite vector index configuration prepared

Alternative: Composite Index Configuration

If your use case requires complex filtering with scalar attributes, you can create a Composite index instead by changing the configuration:

# Alternative configuration for Composite index
INDEX_TYPE = IndexType.COMPOSITE  # Instead of IndexType.HYPERSCALE
INDEX_DESCRIPTION = "IVF,SQ8"     # Same quantization settings
DISTANCE_STRATEGY = DistanceStrategy.COSINE  # Same distance metric

# The rest of the setup remains identical

Use Composite indexes when:

You need to filter by document metadata or attributes before vector similarity
Your queries combine vector search with WHERE clauses
You have well-defined filtering requirements that can reduce the search space

Note: The index creation process is identical - just change the INDEX_TYPE. Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications requiring complex query patterns with metadata filtering.

OpenAI Configuration

This section initializes two key OpenAI components needed for our RAG system:

OpenAI Embeddings:
- Uses the 'text-embedding-3-small' model
- Converts text into high-dimensional vector representations (embeddings)
- These embeddings enable semantic search by capturing the meaning of text
- Required for vector similarity search in Couchbase
ChatOpenAI Language Model:
- Uses the 'gpt-4o' model
- Temperature set to 0.2 for balanced creativity and focus
- Serves as the cognitive engine for CrewAI agents
- Powers agent reasoning, decision-making, and task execution
- Enables agents to:
  - Process and understand retrieved context from vector search
  - Generate thoughtful responses based on that context
  - Follow instructions defined in agent roles and goals
  - Collaborate with other agents in the crew
- The relatively low temperature (0.2) ensures agents produce reliable, consistent outputs while maintaining some creative problem-solving ability

Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication. In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them to interpret tasks, retrieve relevant information via the RAG system, and generate appropriate outputs based on their specialized roles and expertise.

# Initialize OpenAI components
embeddings = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY,
    model="text-embedding-3-small"
)

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model="gpt-4o",
    temperature=0.2
)

print("OpenAI components initialized")

OpenAI components initialized

Document Processing and Vector Store Setup

Create Couchbase Hyperscale Vector Store

Set up the Hyperscale vector store where we'll store document embeddings for high-performance semantic search.

# Setup Hyperscale vector store with OpenAI embeddings
try:
    vector_store = CouchbaseQueryVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        distance_metric=DISTANCE_STRATEGY
    )
    print("Hyperscale Vector store initialized successfully")
    logging.info("Hyperscale Vector store setup completed")
except Exception as e:
    logging.error(f"Failed to initialize Hyperscale vector store: {str(e)}")
    raise RuntimeError(f"Hyperscale Vector store initialization failed: {str(e)}")

2025-10-06 10:18:05 [INFO] Hyperscale Vector store setup completed


Hyperscale Vector store initialized successfully

Load BBC News Dataset

To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-10-06 10:18:13 [INFO] Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows

Data Cleaning

Remove duplicate articles for cleaner search results.

news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.

Save Data to Vector Store

To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:

Memory Efficiency: Processing in smaller batches prevents memory overload
Error Handling: If an error occurs, only the current batch is affected
Progress Tracking: Easier to monitor and track the ingestion progress
Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including document sizes, available system resources, network conditions, and concurrent workload.

batch_size = 50

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2025-10-06 10:19:43 [INFO] Document ingestion completed successfully.

Vector Search Performance Testing

Now let's demonstrate the performance benefits of Hyperscale vector index optimization by testing pure vector search performance. We'll compare three optimization levels:

Baseline Performance: Vector search without Hyperscale vector index optimization
Hyperscale-Optimized Performance: Same search with Hyperscale vector index
Cache Benefits: Show how caching can be applied on top of Hyperscale vector index for repeated queries

Important: This testing focuses on pure vector search performance, isolating the Hyperscale vector index improvements from other workflow overhead.

Create Vector Search Function

import time

# Create Hyperscale vector retriever optimized for high-performance searches
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  # Return top 4 most similar documents
)

def test_vector_search_performance(query_text, label="Vector Search"):
    """Test pure vector search performance and return timing metrics"""
    print(f"\n[{label}] Testing vector search performance")
    print(f"[{label}] Query: '{query_text}'")
    
    start_time = time.time()
    
    try:
        # Perform vector search using the retriever
        docs = retriever.invoke(query_text)
        end_time = time.time()
        
        search_time = end_time - start_time
        print(f"[{label}] Vector search completed in {search_time:.4f} seconds")
        print(f"[{label}] Found {len(docs)} relevant documents")
        
        # Show a preview of the first result
        if docs:
            preview = docs[0].page_content[:100] + "..." if len(docs[0].page_content) > 100 else docs[0].page_content
            print(f"[{label}] Top result preview: {preview}")
        
        return search_time
    except Exception as e:
        print(f"[{label}] Vector search failed: {str(e)}")
        return None

Test 1: Baseline Performance (No Hyperscale Vector Index)

Test pure vector search performance without Hyperscale vector index optimization.

# Test baseline vector search performance without Hyperscale vector index
test_query = "What are the latest developments in football transfers?"
print("Testing baseline vector search performance without Hyperscale vector index optimization...")
baseline_time = test_vector_search_performance(test_query, "Baseline Search")
print(f"\nBaseline vector search time (without Hyperscale vector index): {baseline_time:.4f} seconds\n")

Testing baseline vector search performance without Hyperscale vector index optimization...

[Baseline Search] Testing vector search performance
[Baseline Search] Query: 'What are the latest developments in football transfers?'
[Baseline Search] Vector search completed in 1.3999 seconds
[Baseline Search] Found 4 relevant documents
[Baseline Search] Top result preview: The latest updates and analysis from the BBC.

Baseline vector search time (without Hyperscale vector index): 1.3999 seconds

Create Hyperscale Vector Index

Now let's create a Hyperscale vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store, which will optimize the index settings based on our data and requirements.

# Create Hyperscale Vector Index for high-performance searches
print("Creating Hyperscale vector index...")
try:
    # Create a Hyperscale index optimized for pure vector searches
    vector_store.create_index(
        index_type=INDEX_TYPE,  # Hyperscale index type
        index_description=INDEX_DESCRIPTION  # IVF,SQ8 for optimized performance
    )
    print(f"Hyperscale Vector index created successfully")
    logging.info(f"Hyperscale index created with description '{INDEX_DESCRIPTION}'")
    
    # Wait a moment for index to be available
    print("Waiting for index to become available...")
    time.sleep(5)
    
except Exception as e:
    # Index might already exist, which is fine
    if "already exists" in str(e).lower():
        print(f"Hyperscale Vector index already exists, proceeding...")
        logging.info(f"Index already exists")
    else:
        logging.error(f"Failed to create Hyperscale vector index: {str(e)}")
        raise RuntimeError(f"Hyperscale vector index creation failed: {str(e)}")

Creating Hyperscale vector index...


2025-10-06 10:20:15 [INFO] Hyperscale index created with description 'IVF,SQ8'


Hyperscale Vector index created successfully
Waiting for index to become available...

Test 2: Hyperscale-Optimized Performance

Test the same vector search with Hyperscale vector index optimization.

# Test vector search performance with Hyperscale vector index
print("Testing vector search performance with Hyperscale vector index optimization...")
hyperscale_search_time = test_vector_search_performance(test_query, "Hyperscale-Optimized Search")

Testing vector search performance with Hyperscale vector index optimization...

[Hyperscale-Optimized Search] Testing vector search performance
[Hyperscale-Optimized Search] Query: 'What are the latest developments in football transfers?'
[Hyperscale-Optimized Search] Vector search completed in 0.5885 seconds
[Hyperscale-Optimized Search] Found 4 relevant documents
[Hyperscale-Optimized Search] Top result preview: Four key areas for Everton's new owners to address

Everton fans last saw silverware in 1995 when th...

Test 3: Cache Benefits Testing

Now let's demonstrate how caching can improve performance for repeated queries. Note: Caching benefits apply to both baseline and Hyperscale-optimized searches.

# Test cache benefits with a different query to avoid interference
cache_test_query = "What happened in the latest Premier League matches?"

print("Testing cache benefits with vector search...")
print("First execution (cache miss):")
cache_time_1 = test_vector_search_performance(cache_test_query, "Cache Test - First Run")

print("\nSecond execution (cache hit - should be faster):")
cache_time_2 = test_vector_search_performance(cache_test_query, "Cache Test - Second Run")

Testing cache benefits with vector search...
First execution (cache miss):

[Cache Test - First Run] Testing vector search performance
[Cache Test - First Run] Query: 'What happened in the latest Premier League matches?'
[Cache Test - First Run] Vector search completed in 0.6450 seconds
[Cache Test - First Run] Found 4 relevant documents
[Cache Test - First Run] Top result preview: Who has made Troy's Premier League team of the week?

After every round of Premier League matches th...

Second execution (cache hit - should be faster):

[Cache Test - Second Run] Testing vector search performance
[Cache Test - Second Run] Query: 'What happened in the latest Premier League matches?'
[Cache Test - Second Run] Vector search completed in 0.4306 seconds
[Cache Test - Second Run] Found 4 relevant documents
[Cache Test - Second Run] Top result preview: Who has made Troy's Premier League team of the week?

After every round of Premier League matches th...

Vector Search Performance Analysis

Let's analyze the vector search performance improvements across all optimization levels:

print("\n" + "="*80)
print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY")
print("="*80)

print(f"Phase 1 - Baseline Search (No Hyperscale):     {baseline_time:.4f} seconds")
print(f"Phase 2 - Hyperscale-Optimized Search:         {hyperscale_search_time:.4f} seconds")
if cache_time_1 and cache_time_2:
    print(f"Phase 3 - Cache Benefits:")
    print(f"  First execution (cache miss):         {cache_time_1:.4f} seconds")
    print(f"  Second execution (cache hit):         {cache_time_2:.4f} seconds")

print("\n" + "-"*80)
print("VECTOR SEARCH OPTIMIZATION IMPACT:")
print("-"*80)

# Hyperscale improvement analysis
if baseline_time and hyperscale_search_time:
    speedup = baseline_time / hyperscale_search_time if hyperscale_search_time > 0 else float('inf')
    time_saved = baseline_time - hyperscale_search_time
    percent_improvement = (time_saved / baseline_time) * 100
    print(f"Hyperscale Index Benefit:      {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)")

# Cache improvement analysis
if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1:
    cache_speedup = cache_time_1 / cache_time_2
    cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100
    print(f"Cache Benefit:          {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)")
else:
    print(f"Cache Benefit:          Variable (depends on query complexity and caching mechanism)")

print(f"\nKey Insights for Vector Search Performance:")
print(f"• Hyperscale indexes provide significant performance improvements for vector similarity search")
print(f"• Performance gains are most dramatic for complex semantic queries")
print(f"• Hyperscale optimization is particularly effective for high-dimensional embeddings")
print(f"• Combined with proper quantization (SQ8), Hyperscale vector indexes deliver production-ready performance")
print(f"• These performance improvements directly benefit any application using the vector store")

================================================================================
VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY
================================================================================
Phase 1 - Baseline Search (No Hyperscale):     1.3999 seconds
Phase 2 - Hyperscale-Optimized Search:         0.5885 seconds
Phase 3 - Cache Benefits:
  First execution (cache miss):         0.6450 seconds
  Second execution (cache hit):         0.4306 seconds

--------------------------------------------------------------------------------
VECTOR SEARCH OPTIMIZATION IMPACT:
--------------------------------------------------------------------------------
Hyperscale Index Benefit:      2.38x faster (58.0% improvement)
Cache Benefit:          1.50x faster (33.2% improvement)

Key Insights for Vector Search Performance:
• Hyperscale indexes provide significant performance improvements for vector similarity search
• Performance gains are most dramatic for complex semantic queries
• Hyperscale optimization is particularly effective for high-dimensional embeddings
• Combined with proper quantization (SQ8), Hyperscale delivers production-ready performance
• These performance improvements directly benefit any application using the vector store

CrewAI Agent Setup

What is CrewAI?

Now that we've optimized our vector search performance, let's build a sophisticated agent-based RAG system using CrewAI. CrewAI enables us to create specialized AI agents that collaborate to handle different aspects of the RAG workflow:

Research Agent: Finds and analyzes relevant documents using our optimized vector search
Writer Agent: Takes research findings and creates polished, structured responses
Collaborative Workflow: Agents work together, with the writer building on the researcher's findings

This multi-agent approach produces higher-quality responses than single-agent systems by separating research and writing expertise, while benefiting from the Hyperscale vector index performance improvements we just demonstrated.

Create Vector Search Tool

# Define the Hyperscale vector search tool using the @tool decorator
@tool("hyperscale_vector_search")
def search_tool(query: str) -> str:
    """Search for relevant documents using Hyperscale vector similarity.
    Input should be a simple text query string.
    Returns a list of relevant document contents from Hyperscale vector search.
    Use this tool to find detailed information about topics using high-performance Hyperscale indexes."""
    
    # Invoke the Hyperscale vector retriever (now optimized with HYPERSCALE index)
    docs = retriever.invoke(query)

    # Format the results with distance information
    formatted_docs = "\n\n".join([
        f"Document {i+1}:\n{'-'*40}\n{doc.page_content}"
        for i, doc in enumerate(docs)
    ])
    return formatted_docs

Create CrewAI Agents

# Create research agent
researcher = Agent(
    role='Research Expert',
    goal='Find and analyze the most relevant documents to answer user queries accurately',
    backstory="""You are an expert researcher with deep knowledge in information retrieval 
    and analysis. Your expertise lies in finding, evaluating, and synthesizing information 
    from various sources. You have a keen eye for detail and can identify key insights 
    from complex documents. You always verify information across multiple sources and 
    provide comprehensive, accurate analyses.""",
    tools=[search_tool],
    llm=llm,
    verbose=False,
    memory=True,
    allow_delegation=False
)

# Create writer agent
writer = Agent(
    role='Technical Writer',
    goal='Generate clear, accurate, and well-structured responses based on research findings',
    backstory="""You are a skilled technical writer with expertise in making complex 
    information accessible and engaging. You excel at organizing information logically, 
    explaining technical concepts clearly, and creating well-structured documents. You 
    ensure all information is properly cited, accurate, and presented in a user-friendly 
    manner. You have a talent for maintaining the reader's interest while conveying 
    detailed technical information.""",
    llm=llm,
    verbose=False,
    memory=True,
    allow_delegation=False
)

print("CrewAI agents created successfully with optimized Hyperscale vector search")

CrewAI agents created successfully with optimized Hyperscale vector search

How the Optimized RAG Workflow Works

The complete optimized RAG process:

User Query → Research Agent
Vector Search → Hyperscale index finds similar documents (now with proven performance improvements)
Document Analysis → Research Agent analyzes and synthesizes findings
Response Writing → Writer Agent creates polished, structured response
Final Output → User receives comprehensive, well-formatted answer

Key Benefit: The vector search performance improvements we demonstrated directly enhance the agent workflow efficiency.

CrewAI Agent Demo

Now let's demonstrate the complete optimized agent-based RAG system in action, benefiting from the Hyperscale vector index performance improvements we validated earlier.

Demo Function

def process_interactive_query(query, researcher, writer):
    """Run complete RAG workflow with CrewAI agents using optimized Hyperscale vector search"""
    print(f"\nProcessing Query: {query}")
    print("=" * 80)
    
    # Create tasks
    research_task = Task(
        description=f"Research and analyze information relevant to: {query}",
        agent=researcher,
        expected_output="A detailed analysis with key findings"
    )
    
    writing_task = Task(
        description="Create a comprehensive response",
        agent=writer,
        expected_output="A clear, well-structured answer",
        context=[research_task]
    )
    
    # Execute crew
    crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, writing_task],
        process=Process.sequential,
        verbose=True,
        cache=True,
        planning=True
    )
    
    try:
        start_time = time.time()
        result = crew.kickoff()
        elapsed_time = time.time() - start_time
        
        print(f"\nCompleted in {elapsed_time:.2f} seconds")
        print("=" * 80)
        print("RESPONSE")
        print("=" * 80)
        print(result)
                
        return elapsed_time
    except Exception as e:
        print(f"Error: {str(e)}")
        return None

Run Agent-Based RAG Demo

# Disable logging for cleaner output
logging.disable(logging.CRITICAL)

# Run demo with a sample query
demo_query = "What are the key details about the FA Cup third round draw?"
final_time = process_interactive_query(demo_query, researcher, writer)

if final_time:
    print(f"\n\n✅ CrewAI agent demo completed successfully in {final_time:.2f} seconds")

Conclusion

You have successfully built a powerful agent-based RAG system that combines Couchbase's high-performance Hyperscale and Composite vector storage capabilities with CrewAI's multi-agent architecture. This tutorial demonstrated the complete pipeline from data ingestion to intelligent response generation, with real performance benchmarks showing the dramatic improvements Hyperscale vector indexing provides.

Contents