In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and CrewAI for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the Hyperscale or Composite Vector Index, please take a look at this.
This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.
You can either:
When running Couchbase using Capella, the following prerequisites need to be met:
We'll install the following key libraries:
datasets: For loading and managing our training datalangchain-couchbase: To integrate Couchbase with LangChain for Search Vector Index storage and cachinglangchain-openai: For accessing OpenAI's embedding and chat modelscrewai: To create and orchestrate our AI agents for RAG operationspython-dotenv: For securely managing environment variables and API keysThese libraries provide the foundation for building a semantic search engine with Search Vector Index embeddings, database integration, and agent-based RAG capabilities.
%pip install --quiet datasets==4.1.0 langchain-couchbase==0.4.0 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1 ipywidgetsNote: you may need to restart the kernel to use updated packages.The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.
import getpass
import json
import logging
import os
import time
from datetime import timedelta
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.diagnostics import PingState, ServiceType
from couchbase.exceptions import (InternalServerFailureException,
QueryIndexAlreadyExistsException,
ServiceUnavailableException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from crewai.tools import tool
from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from crewai import Agent, Crew, Process, TaskLogging is configured to track the progress of the script and capture any errors or warnings.
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)s] %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
# Suppress httpx logging
logging.getLogger('httpx').setLevel(logging.CRITICAL)In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.
The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values.
# Load environment variables
load_dotenv("./.env")
# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ")
if not OPENAI_API_KEY:
raise ValueError("OPENAI_API_KEY is not set")
CB_HOST = os.getenv('CB_HOST') or input("Enter Couchbase host (default: couchbase://localhost): ") or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input("Enter Couchbase username (default: Administrator): ") or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass("Enter Couchbase password (default: password): ") or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input("Enter bucket name (default: vector-search-testing): ") or 'vector-search-testing'
INDEX_NAME = os.getenv('INDEX_NAME') or input("Enter index name (default: vector_search_crew): ") or 'vector_search_crew'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input("Enter scope name (default: shared): ") or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input("Enter collection name (default: crew): ") or 'crew'
print("Configuration loaded successfully")Configuration loaded successfullyConnecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.
# Connect to Couchbase
try:
auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(CB_HOST, options)
cluster.wait_until_ready(timedelta(seconds=5))
print("Successfully connected to Couchbase")
except Exception as e:
print(f"Failed to connect to Couchbase: {str(e)}")
raiseSuccessfully connected to CouchbaseIn this section, we verify that the Couchbase Search service is available and responding correctly. This is a crucial check because our Search Vector Index functionality depends on it. If any issues are detected with the Search service, the function will raise an exception, allowing us to catch and handle problems early before attempting vector operations.
def check_search_service(cluster):
"""Verify search service availability using ping"""
try:
# Get ping result
ping_result = cluster.ping()
search_available = False
# Check if search service is responding
for service_type, endpoints in ping_result.endpoints.items():
if service_type == ServiceType.Search:
for endpoint in endpoints:
if endpoint.state == PingState.OK:
search_available = True
print(f"Search service is responding at: {endpoint.remote}")
break
break
if not search_available:
raise RuntimeError("Search service not found or not responding")
print("Search service check passed successfully")
except Exception as e:
print(f"Health check failed: {str(e)}")
raise
try:
check_search_service(cluster)
except Exception as e:
print(f"Failed to check search service: {str(e)}")
raiseSearch service is responding at: 18.117.138.157:18094
Search service check passed successfullyCreate and configure Couchbase bucket, scope, and collection for storing our vector data.
Bucket Creation:
Scope Management:
Collection Setup:
Additional Tasks:
The function is called to set up the main collection for vector embeddings.
def setup_collection(cluster, bucket_name, scope_name, collection_name):
try:
# Check if bucket exists, create if it doesn't
try:
bucket = cluster.bucket(bucket_name)
logging.info(f"Bucket '{bucket_name}' exists.")
except Exception as e:
logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
bucket_settings = CreateBucketSettings(
name=bucket_name,
bucket_type='couchbase',
ram_quota_mb=1024,
flush_enabled=True,
num_replicas=0
)
cluster.buckets().create_bucket(bucket_settings)
time.sleep(2) # Wait for bucket creation to complete and become available
bucket = cluster.bucket(bucket_name)
logging.info(f"Bucket '{bucket_name}' created successfully.")
bucket_manager = bucket.collections()
# Check if scope exists, create if it doesn't
scopes = bucket_manager.get_all_scopes()
scope_exists = any(scope.name == scope_name for scope in scopes)
if not scope_exists and scope_name != "_default":
logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
bucket_manager.create_scope(scope_name)
logging.info(f"Scope '{scope_name}' created successfully.")
# Check if collection exists, create if it doesn't
collections = bucket_manager.get_all_scopes()
collection_exists = any(
scope.name == scope_name and collection_name in [col.name for col in scope.collections]
for scope in collections
)
if not collection_exists:
logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
bucket_manager.create_collection(scope_name, collection_name)
logging.info(f"Collection '{collection_name}' created successfully.")
else:
logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")
# Wait for collection to be ready
collection = bucket.scope(scope_name).collection(collection_name)
time.sleep(2) # Give the collection time to be ready for queries
# Ensure primary index exists
try:
cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()
logging.info("Primary index present or created successfully.")
except Exception as e:
logging.warning(f"Error creating primary index: {str(e)}")
# Clear all documents in the collection
try:
query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
cluster.query(query).execute()
logging.info("All documents cleared from the collection.")
except Exception as e:
logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")
return collection
except Exception as e:
raise RuntimeError(f"Error setting up collection: {str(e)}")
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
2025-09-17 14:34:30 [INFO] Bucket 'vector-search-testing' exists.
2025-09-17 14:34:32 [INFO] Scope 'shared' does not exist. Creating it...
2025-09-17 14:34:33 [INFO] Scope 'shared' created successfully.
2025-09-17 14:34:34 [INFO] Collection 'crew' does not exist. Creating it...
2025-09-17 14:34:36 [INFO] Collection 'crew' created successfully.
2025-09-17 14:34:41 [INFO] Primary index present or created successfully.
2025-09-17 14:34:43 [INFO] All documents cleared from the collection.
<couchbase.collection.Collection at 0x14632ea50>Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase Search Vector Index comes into play. In this step, we load the Search Vector Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.
This CrewAI Search Vector Index configuration requires specific default settings to function properly. This tutorial uses the bucket named vector-search-testing with the scope shared and collection crew. The configuration is set up for vectors with exactly 1536 dimensions, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly.
For more information on creating a Search Vector Index, please follow the instructions at Couchbase Vector Search Documentation.
# Load index definition
try:
with open('crew_index.json', 'r') as file:
index_definition = json.load(file)
except FileNotFoundError as e:
print(f"Error: crew_index.json file not found: {str(e)}")
raise
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in crew_index.json: {str(e)}")
raise
except Exception as e:
print(f"Error loading index definition: {str(e)}")
raiseWith the index definition loaded, the next step is to create or update the Search Vector Index in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Search Vector Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine.
try:
scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()
# Check if index already exists
existing_indexes = scope_index_manager.get_all_indexes()
index_name = index_definition["name"]
if index_name in [index.name for index in existing_indexes]:
logging.info(f"Index '{index_name}' found")
else:
logging.info(f"Creating new index '{index_name}'...")
# Create SearchIndex object from JSON definition
search_index = SearchIndex.from_json(index_definition)
# Upsert the index (create if not exists, update if exists)
scope_index_manager.upsert_index(search_index)
logging.info(f"Index '{index_name}' successfully created/updated.")
except QueryIndexAlreadyExistsException:
logging.info(f"Index '{index_name}' already exists. Skipping creation/update.")
except ServiceUnavailableException:
raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.")
except InternalServerFailureException as e:
logging.error(f"Internal server error: {str(e)}")
raise2025-09-17 14:34:47 [INFO] Creating new index 'vector_search_crew'...
2025-09-17 14:34:48 [INFO] Index 'vector_search_crew' successfully created/updated.This section initializes two key OpenAI components needed for our RAG system:
OpenAI Embeddings:
ChatOpenAI Language Model:
Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication. In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them to interpret tasks, retrieve relevant information via the RAG system, and generate appropriate outputs based on their specialized roles and expertise.
# Initialize OpenAI components
embeddings = OpenAIEmbeddings(
openai_api_key=OPENAI_API_KEY,
model="text-embedding-3-small"
)
llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY,
model="gpt-4o",
temperature=0.2
)
print("OpenAI components initialized")OpenAI components initializedA vector store is where we'll keep our embeddings. Unlike traditional text-based search, the Search Vector Store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the Search Vector Store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used.
# Setup vector store
vector_store = CouchbaseSearchVectorStore(
cluster=cluster,
bucket_name=CB_BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=INDEX_NAME,
)
print("Vector store initialized")Vector store initializedTo build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.
The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.
try:
news_dataset = load_dataset(
"RealTimeData/bbc_news_alltime", "2024-12", split="train"
)
print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
raise ValueError(f"Error loading the BBC News dataset: {str(e)}")2025-09-17 14:35:10 [INFO] Successfully loaded the BBC News dataset with 2687 rows.
Loaded the BBC News dataset with 2687 rowsWe will use the content of the news articles for our RAG system.
The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
if article:
unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")We have 1749 unique articles in our database.To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.
We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.
This approach offers several benefits:
We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including document sizes, available system resources, network conditions, and concurrent workload.
batch_size = 50
# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]
try:
vector_store.add_texts(
texts=articles,
batch_size=batch_size
)
logging.info("Document ingestion completed successfully.")
except Exception as e:
raise ValueError(f"Failed to save documents to vector store: {str(e)}")2025-09-17 14:36:58 [INFO] Document ingestion completed successfully.After loading our data into the vector store, we need to create a tool that can efficiently search through these vector embeddings. This involves two key components:
Vector Retriever: The vector retriever is configured to perform similarity searches. This creates a retriever that performs semantic similarity searches against our vector database. The similarity search finds documents whose vector embeddings are closest to the query's embedding in the vector space.
Search Tool: The search tool wraps the retriever in a user-friendly interface that:
The tool is designed to integrate seamlessly with our AI agents, providing them with reliable access to our knowledge base through vector similarity search.
# Create vector retriever
retriever = vector_store.as_retriever(
search_type="similarity",
)
# Define the search tool using the @tool decorator
@tool("vector_search")
def search_tool(query: str) -> str:
"""Search for relevant documents using vector similarity.
Input should be a simple text query string.
Returns a list of relevant document contents.
Use this tool to find detailed information about topics."""
# Handle potential non-string query input if needed (similar to original lambda)
# CrewAI usually passes the string directly based on task description
# but checking doesn't hurt, though the Agent logic might handle this.
# query_str = query if isinstance(query, str) else str(query.get('query', '')) # Simplified for now
# Invoke the retriever
docs = retriever.invoke(query)
# Format the results
formatted_docs = "\n\n".join([
f"Document {i+1}:\n{'-'*40}\n{doc.page_content}"
for i, doc in enumerate(docs)
])
return formatted_docsWe'll create two specialized AI agents using the CrewAI framework to handle different aspects of our information retrieval and analysis system:
Research Expert Agent: This agent is designed to:
Technical Writer Agent: This agent is responsible for:
Agent Workflow: The agents work together in a coordinated way:
This multi-agent approach allows us to:
# Custom response template
response_template = """
Analysis Results
===============
{%- if .Response %}
{{ .Response }}
{%- endif %}
Sources
=======
{%- for tool in .Tools %}
* {{ tool.name }}
{%- endfor %}
Metadata
========
* Confidence: {{ .Confidence }}
* Analysis Time: {{ .ExecutionTime }}
"""
# Create research agent
researcher = Agent(
role='Research Expert',
goal='Find and analyze the most relevant documents to answer user queries accurately',
backstory="""You are an expert researcher with deep knowledge in information retrieval
and analysis. Your expertise lies in finding, evaluating, and synthesizing information
from various sources. You have a keen eye for detail and can identify key insights
from complex documents. You always verify information across multiple sources and
provide comprehensive, accurate analyses.""",
tools=[search_tool],
llm=llm,
verbose=True,
memory=True,
allow_delegation=False,
response_template=response_template
)
# Create writer agent
writer = Agent(
role='Technical Writer',
goal='Generate clear, accurate, and well-structured responses based on research findings',
backstory="""You are a skilled technical writer with expertise in making complex
information accessible and engaging. You excel at organizing information logically,
explaining technical concepts clearly, and creating well-structured documents. You
ensure all information is properly cited, accurate, and presented in a user-friendly
manner. You have a talent for maintaining the reader's interest while conveying
detailed technical information.""",
llm=llm,
verbose=True,
memory=True,
allow_delegation=False,
response_template=response_template
)
print("Agents created successfully")Agents created successfullyThis system uses a two-agent approach to implement Retrieval-Augmented Generation (RAG):
Research Expert Agent:
Technical Writer Agent:
The Complete RAG Process:
This multi-agent approach separates concerns (research vs. writing) and leverages specialized expertise for each task, resulting in higher quality responses.
Test the system with example queries.
def process_query(query, researcher, writer):
"""
Run complete RAG workflow with CrewAI agents using Search Vector Store.
This function tests both the vector search capability and the agent-based processing:
1. Vector search: Retrieves relevant documents from Couchbase Search Vector Store
2. Agent processing: Uses CrewAI agents to analyze and format the response
The function measures performance and displays detailed outputs from each step.
"""
print(f"\nQuery: {query}")
print("-" * 80)
# Create tasks
research_task = Task(
description=f"Research and analyze information relevant to: {query}",
agent=researcher,
expected_output="A detailed analysis with key findings and supporting evidence"
)
writing_task = Task(
description="Create a comprehensive and well-structured response",
agent=writer,
expected_output="A clear, comprehensive response that answers the query",
context=[research_task]
)
# Create and execute crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True,
cache=True,
planning=True
)
try:
start_time = time.time()
result = crew.kickoff()
elapsed_time = time.time() - start_time
print(f"\nQuery completed in {elapsed_time:.2f} seconds")
print("=" * 80)
print("RESPONSE")
print("=" * 80)
print(result)
if hasattr(result, 'tasks_output'):
print("\n" + "=" * 80)
print("DETAILED TASK OUTPUTS")
print("=" * 80)
for task_output in result.tasks_output:
print(f"\nTask: {task_output.description[:100]}...")
print("-" * 40)
print(f"Output: {task_output.raw}")
print("-" * 40)
except Exception as e:
print(f"Error executing crew: {str(e)}")
logging.error(f"Crew execution failed: {str(e)}", exc_info=True)# Disable logging before running the query
logging.disable(logging.CRITICAL)
query = "What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures."
process_query(query, researcher, writer)Query: What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures.
--------------------------------------------------------------------------------╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮ │ │ │ Crew Execution Started │ │ Name: crew │ │ ID: 02c49af6-ffe5-4bea-8cba-f3f08049625d │ │ Tool Args: │ │ │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
[2025-09-17 14:36:58][INFO]: Planning the crew execution
[EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'key'╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮ │ │ │ Task Completed │ │ Name: 5d4df0c5-14ad-47d7-8412-2cb8438a65df │ │ Agent: Task Execution Planner │ │ Tool Args: │ │ │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────── 🔧 Agent Tool Execution ────────────────────────────────────────────╮ │ │ │ Agent: Research Expert │ │ │ │ Thought: Thought: To gather detailed information about the FA Cup third round draw, specifically focusing on │ │ the matches Manchester United vs Arsenal and Tamworth vs Tottenham, I will perform a vector search using a │ │ relevant query. │ │ │ │ Using Tool: vector_search │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────── Tool Input ───────────────────────────────────────────────────╮ │ │ │ "{\"query\": \"FA Cup third round draw Manchester United vs Arsenal Tamworth vs Tottenham\"}" │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮ │ │ │ Task Completed │ │ Name: d883be8b-ac2a-4678-80b3-afdc803bd716 │ │ Agent: Research Expert │ │ Tool Args: │ │ │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮ │ │ │ Task Completed │ │ Name: 674a305d-1a6f-4b60-9497-ff4140f0f473 │ │ Agent: Technical Writer │ │ Tool Args: │ │ │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Query completed in 38.89 seconds
================================================================================
RESPONSE
================================================================================
**FA Cup Third Round Draw: A Comprehensive Overview**
The FA Cup third round draw is a pivotal moment in the English football calendar, marking the entry of Premier League and Championship clubs into the competition. This stage often brings thrilling encounters and the potential for giant-killing acts, capturing the imagination of fans worldwide. The significance of the third round is underscored by the rich history and tradition of the FA Cup, the world's oldest national football competition.
**Manchester United vs Arsenal**
One of the standout fixtures of the third round is the clash between Manchester United and Arsenal. This match is set to take place over the weekend of Saturday, 11 January. Manchester United, the current holders of the FA Cup, will travel to face Arsenal, who have won the competition a record 14 times. The match is significant as it involves two of the most successful clubs in FA Cup history, both known for their storied pasts and passionate fanbases.
- **Date and Venue:** Weekend of Saturday, 11 January, at Arsenal's home ground.
- **Team Statistics:** Manchester United have lifted the FA Cup 13 times, while Arsenal hold the record with 14 victories.
- **Recent Form:** Manchester United recently triumphed over Manchester City to claim their 13th FA Cup title, showcasing their competitive edge.
- **Predictions and Insights:** Given the historical rivalry and the stakes involved, this fixture promises to be a fiercely contested battle, with both teams eager to progress further in the tournament.
**Tamworth vs Tottenham**
Another intriguing fixture is the match between non-league side Tamworth and Premier League club Tottenham Hotspur. Tamworth, one of only two non-league clubs remaining in the competition, will host Spurs, highlighting the classic "David vs Goliath" narrative that the FA Cup is renowned for.
- **Date and Venue:** To be played at Tamworth's home ground over the weekend of Saturday, 11 January.
- **Team Statistics:** Tamworth is the lowest-ranked team remaining in the competition, while Tottenham is a well-established Premier League club.
- **Recent Form:** Tamworth secured their place in the third round with a dramatic penalty shootout victory against League One side Burton Albion.
... (output truncated for brevity)By following these steps, you've built a powerful RAG system that combines Couchbase's Search Vector Index storage capabilities with CrewAI's agent-based architecture. This multi-agent approach separates research and writing concerns, resulting in higher quality responses to user queries.
The system demonstrates several key advantages:
Whether you're building a customer support system, a research assistant, or a knowledge management solution, this agent-based RAG approach provides a flexible foundation that can be adapted to various use cases and domains.