In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and Hugging Face as the AI-powered embedding model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval.
This tutorial demonstrates how to leverage Couchbase's Hyperscale and Composite Vector Indexes with Hugging Face embeddings to create a high-performance semantic search system. These vector indexes offer significant advantages over Search Vector Index approaches, particularly for vector-first workloads and scenarios requiring complex filtering with high query-per-second (QPS) performance.
For more information on Hyperscale and Composite Vector Indexes, see the Couchbase Vector Index Documentation.
This guide is designed to be comprehensive yet accessible, with clear step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system. Whether you're building a recommendation engine, content discovery platform, or any application requiring intelligent document retrieval, this tutorial provides the foundation you need.
Note: If you want to perform semantic search using the Search Vector Index instead, please take a look at this alternative tutorial.
This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.
You can either download the notebook file and run it on Google Colab or run it on your system by setting up the Python environment.
!pip install --quiet langchain-couchbase==1.0.1 transformers==4.56.1 sentence_transformers==5.1.0 langchain_huggingface python-dotenv==1.1.1 ipywidgetsfrom pathlib import Path
from datetime import timedelta
from transformers import pipeline, AutoModel, AutoTokenizer
from langchain_huggingface.embeddings.huggingface import HuggingFaceEmbeddings
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from langchain_core.globals import set_llm_cache
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType
import getpass
import os
from dotenv import load_dotenvTo run this tutorial successfully, you will need the following requirements:
Version Requirements:
Access Requirements:
To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.
To learn more, please follow the instructions.
When running Couchbase using Capella, the following prerequisites need to be met:
langchain-couchbase==1.0.1transformers==4.56.1sentence_transformers==5.1.0# Load environment variables
load_dotenv("./.env")
# Configuration
couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:")
couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:")
couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:")
couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:")
couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:")
couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:")In this section, we first need to create a PasswordAuthenticator object that would hold our Couchbase credentials:
auth = PasswordAuthenticator(
couchbase_username,
couchbase_password
)Then, we use this object to connect to Couchbase Cluster and select specified above bucket, scope and collection:
print("Connecting to cluster at URL: " + couchbase_cluster_url)
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth))
cluster.wait_until_ready(timedelta(seconds=5))
bucket = cluster.bucket(couchbase_bucket)
scope = bucket.scope(couchbase_scope)
collection = scope.collection(couchbase_collection)
print("Connected to the cluster")Connecting to cluster at URL: couchbase://localhost
Connected to the clusterWith Couchbase 8.0+, you can leverage the power of Hyperscale and Composite Vector Indexes, which offer significant performance improvements over Search Vector Index approaches for vector-first workloads. These indexes provide high-performance vector similarity search with advanced filtering capabilities and are designed to scale to billions of vectors.
| Feature | Hyperscale & Composite Vector Index | Search Vector Index |
|---|---|---|
| Best For | Vector-first workloads, complex filtering, high QPS performance | Hybrid search and high recall rates |
| Couchbase Version | 8.0.0+ | 7.6+ |
| Filtering | Pre-filtering with WHERE clauses (Composite) or post-filtering (Hyperscale) |
Pre-filtering with flexible ordering |
| Scalability | Up to billions of vectors (Hyperscale) | Up to 10 million vectors |
| Performance | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries |
Couchbase offers two distinct query-based vector index types, each optimized for different use cases:
In this tutorial, we'll demonstrate creating a Hyperscale Vector Index and running vector similarity queries. Hyperscale is ideal for semantic search scenarios where you want:
The Hyperscale Vector Index will provide optimal performance for our Hugging Face embedding-based semantic search implementation.
If your use case requires complex filtering with scalar attributes, you may want to consider using a Composite Vector Index instead:
# Alternative: Create a Composite index for filtered searches
vector_store.create_index(
index_type=IndexType.COMPOSITE,
index_description="IVF,SQ8",
distance_metric=DistanceStrategy.COSINE,
index_name="huggingface_composite_index",
)Use Composite indexes when:
Note: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments.
Before creating our Hyperscale index, it's important to understand the configuration parameters that optimize vector storage and search performance. The index_description parameter controls how Couchbase optimizes vector storage through centroids and quantization.
'IVF[<centroids>],{PQ|SQ}<settings>'IVF,SQ8), Couchbase auto-selects based on dataset sizeScalar Quantization (SQ):
SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)Product Quantization (PQ):
PQ<subquantizers>x<bits> (e.g., PQ32x8)IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantizationIVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bitsFor detailed configuration options, see the Quantization & Centroid Settings.
For more information on Hyperscale and Composite Vector Indexes, see Couchbase Vector Index Documentation.
In this tutorial, we use IVF,SQ8 which provides:
# Create a Hyperscale Vector Index store (good default: IVF,SQ8)
vector_store = CouchbaseQueryVectorStore(
cluster=cluster,
bucket_name=couchbase_bucket,
scope_name=couchbase_scope,
collection_name=couchbase_collection,
embedding=HuggingFaceEmbeddings(), # Hugging Face Initialization
distance_metric=DistanceStrategy.COSINE
)Now that we have set up our vector store with Hugging Face embeddings, we can add documents to our collection. The CouchbaseQueryVectorStore automatically handles the embedding generation process using the Hugging Face transformers library.
When we add text documents to our vector store, several important processes happen automatically:
In this example, we're adding sample documents that demonstrate Couchbase's capabilities. The system will:
Note: The batch_size parameter controls how many documents are processed together, which can help optimize performance for large document sets.
texts = [
"Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.",
"It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
input("Enter custom embedding text:")
]
vector_store.add_texts(texts=texts, batch_size=32)['7c601881e4bf4c53b5b4c2a25628d904',
'0442f351aec2415481138315d492ee80',
'e20a8dcd8b464e8e819b87c9a0ff05c3']Now let's demonstrate the performance benefits of different optimization approaches available in Couchbase. We'll compare three optimization levels to show how each contributes to building a production-ready semantic search system:
Important: Caching is orthogonal to index types - you can apply caching benefits to both raw searches and optimized searches to improve repeated query performance.
Before we start our RAG comparisons, let's understand what the search results mean:
When you perform a search query with vector search:
Note: The returned value represents the vector distance between query and document embeddings. Lower distance values indicate higher similarity.
Let's create a comprehensive search function for our RAG performance comparison:
import time
def search_with_performance_metrics(query_text, stage_name, k=3):
"""Perform optimized semantic search with detailed performance metrics"""
print(f"\n=== {stage_name.upper()} ===")
print(f"Query: \"{query_text}\"")
start_time = time.time()
results = vector_store.similarity_search_with_score(query_text, k=k)
end_time = time.time()
search_time = end_time - start_time
print(f"Search Time: {search_time:.4f} seconds")
print(f"Results Found: {len(results)} documents")
for i, (doc, distance) in enumerate(results, 1):
print(f"\n[Result {i}]")
print(f"Vector Distance: {distance:.6f} (lower = more similar)")
# Use the document content directly from search results (no additional KV call needed)
print(f"Document Content: {doc.page_content}")
if hasattr(doc, 'metadata') and doc.metadata:
print(f"Metadata: {doc.metadata}")
return search_time, resultsFirst, let's establish baseline performance with raw vector search - no Hyperscale optimization yet:
test_query = "What are the key features of a scalable NoSQL database?"
print("Testing baseline performance without Hyperscale optimization...")
baseline_time, baseline_results = search_with_performance_metrics(
test_query, "Phase 1: Baseline Vector Search"
)Testing baseline performance without Hyperscale optimization...
=== PHASE 1: BASELINE VECTOR SEARCH ===
Query: "What are the key features of a scalable NoSQL database?"
Search Time: 0.1484 seconds
Results Found: 3 documents
[Result 1]
Vector Distance: 0.586197 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.
[Result 2]
Vector Distance: 0.645435 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
[Result 3]
Vector Distance: 0.976888 (lower = more similar)
Document Content: this is a sample text with the data "hello"Now let's create the Hyperscale Vector Index and measure the performance improvement:
# Create Hyperscale index for optimized vector search
print("Creating Hyperscale Vector Index...")
try:
vector_store.create_index(
index_type=IndexType.HYPERSCALE,
index_description="IVF,SQ8",
distance_metric=DistanceStrategy.COSINE,
index_name="huggingface_hyperscale_index",
)
print("✓ Hyperscale Vector Index created successfully!")
# Wait for index to become available
print("Waiting for index to become available...")
time.sleep(3)
except Exception as e:
if "already exists" in str(e).lower():
print("✓ Hyperscale Vector Index already exists, proceeding...")
else:
print(f"Error creating Hyperscale index: {str(e)}")
# Test the same query with Hyperscale optimization
print("\nTesting performance with Hyperscale optimization...")
optimized_time, optimized_results = search_with_performance_metrics(
test_query, "Phase 2: Optimized Search"
)Creating Hyperscale Vector Index...
✓ Hyperscale Vector Index created successfully!
Waiting for index to become available...
Testing performance with Hyperscale optimization...
=== PHASE 2: OPTIMIZED SEARCH ===
Query: "What are the key features of a scalable NoSQL database?"
Search Time: 0.0848 seconds
Results Found: 3 documents
[Result 1]
Vector Distance: 0.586197 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.
[Result 2]
Vector Distance: 0.645435 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
[Result 3]
Vector Distance: 0.976888 (lower = more similar)
Document Content: this is a sample text with the data "hello"Now let's show how caching can improve performance for repeated queries. Note: Caching benefits apply to both raw searches and optimized searches.
# Set up Couchbase cache (can be applied to any search approach)
print("Setting up Couchbase cache for improved performance on repeated queries...")
cache = CouchbaseCache(
cluster=cluster,
bucket_name=couchbase_bucket,
scope_name=couchbase_scope,
collection_name=couchbase_collection,
)
set_llm_cache(cache)
print("✓ Couchbase cache enabled!")
# Test cache benefits with the same query (should show improvement on second run)
cache_query = "How does a distributed database handle high-speed operations?"
print("\nTesting cache benefits with a different query...")
print("First execution (cache miss):")
cache_time_1, _ = search_with_performance_metrics(
cache_query, "Phase 3a: First Query (Cache Miss)", k=2
)
print("\nSecond execution (cache hit):")
cache_time_2, _ = search_with_performance_metrics(
cache_query, "Phase 3b: Repeated Query (Cache Hit)", k=2
)Setting up Couchbase cache for improved performance on repeated queries...
✓ Couchbase cache enabled!
Testing cache benefits with a different query...
First execution (cache miss):
=== PHASE 3A: FIRST QUERY (CACHE MISS) ===
Query: "How does a distributed database handle high-speed operations?"
Search Time: 0.1024 seconds
Results Found: 2 documents
[Result 1]
Vector Distance: 0.632770 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.
[Result 2]
Vector Distance: 0.677951 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
Second execution (cache hit):
=== PHASE 3B: REPEATED QUERY (CACHE HIT) ===
Query: "How does a distributed database handle high-speed operations?"
Search Time: 0.0289 seconds
Results Found: 2 documents
[Result 1]
Vector Distance: 0.632770 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.
[Result 2]
Vector Distance: 0.677951 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.Let's analyze the complete performance improvements across all optimization levels:
print("\n" + "="*80)
print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY")
print("="*80)
print(f"Phase 1 - Baseline (Raw Search): {baseline_time:.4f} seconds")
print(f"Phase 2 - Optimized Search: {optimized_time:.4f} seconds")
print(f"Phase 3 - Cache Benefits:")
print(f" First execution (cache miss): {cache_time_1:.4f} seconds")
print(f" Second execution (cache hit): {cache_time_2:.4f} seconds")
print("\n" + "-"*80)
print("OPTIMIZATION IMPACT ANALYSIS:")
print("-"*80)
# Vector Index improvement analysis
if optimized_time and baseline_time and optimized_time < baseline_time:
index_speedup = baseline_time / optimized_time
index_improvement = ((baseline_time - optimized_time) / baseline_time) * 100
print(f"Vector Index Benefit: {index_speedup:.2f}x faster ({index_improvement:.1f}% improvement)")
else:
print(f"Vector Index Benefit: Performance similar to baseline (may vary with dataset size)")
# Cache improvement analysis
if cache_time_2 and cache_time_1 and cache_time_2 < cache_time_1:
cache_speedup = cache_time_1 / cache_time_2
cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100
print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)")
else:
print(f"Cache Benefit: No significant improvement (results may be cached already)")
print(f"\nKey Insights:")
print(f"• Hyperscale optimization provides consistent performance benefits, especially with larger datasets")
print(f"• Caching benefits apply to both raw and optimized searches")
print(f"• Combined Hyperscale + Cache provides the best performance for production applications")
print(f"• Hyperscale indexes scale to billions of vectors with optimized concurrent operations")================================================================================
VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY
================================================================================
Phase 1 - Baseline (Raw Search): 0.1484 seconds
Phase 2 - Optimized Search: 0.0848 seconds
Phase 3 - Cache Benefits:
First execution (cache miss): 0.1024 seconds
Second execution (cache hit): 0.0289 seconds
--------------------------------------------------------------------------------
OPTIMIZATION IMPACT ANALYSIS:
--------------------------------------------------------------------------------
Vector Index Benefit: 1.75x faster (42.8% improvement)
Cache Benefit: 3.55x faster (71.8% improvement)
Key Insights:
• Hyperscale optimization provides consistent performance benefits, especially with larger datasets
• Caching benefits apply to both raw and optimized searches
• Combined Hyperscale + Cache provides the best performance for production applications
• Hyperscale indexes scale to billions of vectors with optimized concurrent operationsTry your own queries with the optimized Hyperscale search system:
custom_query = input("Enter your search query: ")
search_with_performance_metrics(custom_query, "Interactive Optimized Search")
=== INTERACTIVE OPTIMIZED SEARCH ===
Query: "What is the sample data?"
Search Time: 0.0812 seconds
Results Found: 3 documents
[Result 1]
Vector Distance: 0.623644 (lower = more similar)
Document Content: this is a sample text with the data "hello"
[Result 2]
Vector Distance: 0.860599 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.
[Result 3]
Vector Distance: 0.909207 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.
(0.08118820190429688,
[(Document(id='e20a8dcd8b464e8e819b87c9a0ff05c3', metadata={}, page_content='this is a sample text with the data "hello"'),
0.6236441411684932),
(Document(id='0442f351aec2415481138315d492ee80', metadata={}, page_content='It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.'),
0.8605992009935179),
(Document(id='7c601881e4bf4c53b5b4c2a25628d904', metadata={}, page_content='Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.'),
0.9092065785676496)])You have successfully built a powerful semantic search engine using Couchbase's Hyperscale and Composite Vector Indexes with Hugging Face embeddings. This guide has walked you through the complete process of creating a high-performance vector search system that can scale to handle billions of documents.