Using Hugging Face Embeddings with Couchbase Hyperscale and Composite Vector Index

Learn how to generate embeddings using Hugging Face and store them in Couchbase.
This tutorial demonstrates how to use Couchbase's vector search capabilities with Hugging Face embeddings using Hyperscale and Composite Vector Indexes.
You'll understand how to perform high-performance vector search to find relevant documents based on similarity.

Introduction

In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and Hugging Face as the AI-powered embedding model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval.

This tutorial demonstrates how to leverage Couchbase's Hyperscale and Composite Vector Indexes with Hugging Face embeddings to create a high-performance semantic search system. These vector indexes offer significant advantages over Search Vector Index approaches, particularly for vector-first workloads and scenarios requiring complex filtering with high query-per-second (QPS) performance.

For more information on Hyperscale and Composite Vector Indexes, see the Couchbase Vector Index Documentation.

This guide is designed to be comprehensive yet accessible, with clear step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system. Whether you're building a recommendation engine, content discovery platform, or any application requiring intelligent document retrieval, this tutorial provides the foundation you need.

Note: If you want to perform semantic search using the Search Vector Index instead, please take a look at this alternative tutorial.

How to Run This Tutorial

This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here.

You can either download the notebook file and run it on Google Colab or run it on your system by setting up the Python environment.

Setup and Installation

Install Necessary Libraries

!pip install --quiet langchain-couchbase==1.0.1 transformers==4.56.1 sentence_transformers==5.1.0 langchain_huggingface python-dotenv==1.1.1 ipywidgets

Import Required Modules

from pathlib import Path
from datetime import timedelta
from transformers import pipeline, AutoModel, AutoTokenizer
from langchain_huggingface.embeddings.huggingface import HuggingFaceEmbeddings
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from langchain_core.globals import set_llm_cache
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore
from langchain_couchbase.vectorstores import DistanceStrategy
from langchain_couchbase.vectorstores import IndexType
import getpass
import os
from dotenv import load_dotenv

Prerequisites

To run this tutorial successfully, you will need the following requirements:

Couchbase Requirements

Version Requirements:

Couchbase Server 8.0+ or Couchbase Capella with Query Service enabled
Note: Hyperscale and Composite Vector Indexes require Couchbase Server 8.0 or above, unlike Search Vector Index which works with 7.6+

Access Requirements:

A configured Bucket, Scope, and Collection
User credentials with Read and Write access to your target collection
Network connectivity to your Couchbase cluster

Create and Deploy Your Free Tier Operational Cluster on Capella

To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint.

To learn more, please follow the instructions.

Couchbase Capella Configuration

When running Couchbase using Capella, the following prerequisites need to be met:

Create the database credentials to access the required bucket (Read and Write) used in the application.
Allow access to the Cluster from the IP on which the application is running.

Python Environment Requirements

Python 3.8+
Required Python packages (installed via pip in the next section):
- langchain-couchbase==1.0.1
- transformers==4.56.1
- sentence_transformers==5.1.0

# Load environment variables
load_dotenv("./.env")

# Configuration
couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:")
couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:")
couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:")
couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:")
couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:")
couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:")

Couchbase Connection Setup

Create Authentication Object

In this section, we first need to create a PasswordAuthenticator object that would hold our Couchbase credentials:

auth = PasswordAuthenticator(
    couchbase_username,
    couchbase_password
)

Connect to Cluster

Then, we use this object to connect to Couchbase Cluster and select specified above bucket, scope and collection:

print("Connecting to cluster at URL: " + couchbase_cluster_url)
cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth))
cluster.wait_until_ready(timedelta(seconds=5))

bucket = cluster.bucket(couchbase_bucket)
scope = bucket.scope(couchbase_scope)
collection = scope.collection(couchbase_collection)
print("Connected to the cluster")

Connecting to cluster at URL: couchbase://localhost
Connected to the cluster

Understanding Hyperscale and Composite Vector Indexes

Optimizing Vector Search with Hyperscale and Composite Indexes

With Couchbase 8.0+, you can leverage the power of Hyperscale and Composite Vector Indexes, which offer significant performance improvements over Search Vector Index approaches for vector-first workloads. These indexes provide high-performance vector similarity search with advanced filtering capabilities and are designed to scale to billions of vectors.

Hyperscale/Composite vs Search Vector Index: Choosing the Right Approach

Feature	Hyperscale & Composite Vector Index	Search Vector Index
Best For	Vector-first workloads, complex filtering, high QPS performance	Hybrid search and high recall rates
Couchbase Version	8.0.0+	7.6+
Filtering	Pre-filtering with `WHERE` clauses (Composite) or post-filtering (Hyperscale)	Pre-filtering with flexible ordering
Scalability	Up to billions of vectors (Hyperscale)	Up to 10 million vectors
Performance	Optimized for concurrent operations with low memory footprint	Good for mixed text and vector queries

Vector Index Types

Couchbase offers two distinct query-based vector index types, each optimized for different use cases:

Hyperscale Vector Indexes

Best for: Pure vector searches like content discovery, recommendations, and semantic search
Use when: You primarily perform vector-only queries without complex scalar filtering
Features:
- High performance with low memory footprint
- Optimized for concurrent operations
- Designed to scale to billions of vectors
- Supports post-scan filtering for basic metadata filtering

Composite Vector Indexes

Best for: Filtered vector searches that combine vector similarity with scalar value filtering
Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data
Features:
- Efficient pre-filtering where scalar attributes reduce the vector comparison scope
- Best for well-defined workloads requiring complex filtering
- Supports range lookups combined with vector search

Index Type Selection for This Tutorial

In this tutorial, we'll demonstrate creating a Hyperscale Vector Index and running vector similarity queries. Hyperscale is ideal for semantic search scenarios where you want:

High-performance vector search across large datasets
Low latency for real-time applications
Scalability to handle growing vector collections
Concurrent operations for multi-user environments

The Hyperscale Vector Index will provide optimal performance for our Hugging Face embedding-based semantic search implementation.

Alternative: Composite Vector Index

If your use case requires complex filtering with scalar attributes, you may want to consider using a Composite Vector Index instead:

# Alternative: Create a Composite index for filtered searches
vector_store.create_index(
    index_type=IndexType.COMPOSITE,
    index_description="IVF,SQ8",
    distance_metric=DistanceStrategy.COSINE,
    index_name="huggingface_composite_index",
)

Use Composite indexes when:

You need to filter by document metadata or attributes before vector similarity
Your queries combine vector search with WHERE clauses
You have well-defined filtering requirements that can reduce the search space

Note: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments.

Understanding Index Configuration (Couchbase 8.0 Feature)

Before creating our Hyperscale index, it's important to understand the configuration parameters that optimize vector storage and search performance. The index_description parameter controls how Couchbase optimizes vector storage through centroids and quantization.

Index Description Format: `'IVF[<centroids>],{PQ|SQ}<settings>'`

Centroids (IVF - Inverted File)

Controls how the dataset is subdivided for faster searches
More centroids = faster search, slower training time
Fewer centroids = slower search, faster training time
If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size

Quantization Options

Scalar Quantization (SQ):

SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension)
Lower memory usage, faster search, slightly reduced accuracy

Product Quantization (PQ):

Format: PQ<subquantizers>x<bits> (e.g., PQ32x8)
Better compression for very large datasets
More complex but can maintain accuracy with smaller index size

Common Configuration Examples

IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default)
IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization
IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits

For detailed configuration options, see the Quantization & Centroid Settings.

For more information on Hyperscale and Composite Vector Indexes, see Couchbase Vector Index Documentation.

Our Configuration Choice

In this tutorial, we use IVF,SQ8 which provides:

Auto-selected centroids optimized for our dataset size
8-bit scalar quantization for good balance of speed, memory usage, and accuracy
COSINE distance metric ideal for semantic similarity search
Optimal performance for most semantic search use cases

# Create a Hyperscale Vector Index store (good default: IVF,SQ8)
vector_store = CouchbaseQueryVectorStore(
    cluster=cluster,
    bucket_name=couchbase_bucket,
    scope_name=couchbase_scope,
    collection_name=couchbase_collection,
    embedding=HuggingFaceEmbeddings(), # Hugging Face Initialization
    distance_metric=DistanceStrategy.COSINE
)

Document Processing and Embedding

Embedding Documents

Now that we have set up our vector store with Hugging Face embeddings, we can add documents to our collection. The CouchbaseQueryVectorStore automatically handles the embedding generation process using the Hugging Face transformers library.

Understanding the Embedding Process

When we add text documents to our vector store, several important processes happen automatically:

Text Preprocessing: The input text is preprocessed and tokenized according to the Hugging Face model's requirements
Vector Generation: Each document is converted into a high-dimensional vector (embedding) that captures its semantic meaning
Storage: The embeddings are stored in Couchbase along with the original text and any metadata
Indexing: The vectors are indexed using our Hyperscale Vector Index for efficient similarity search

Adding Sample Documents

In this example, we're adding sample documents that demonstrate Couchbase's capabilities. The system will:

Generate embeddings for each text document using the Hugging Face model
Store them in our Couchbase collection
Make them immediately available for semantic search once the Hyperscale Vector Index is ready

Note: The batch_size parameter controls how many documents are processed together, which can help optimize performance for large document sets.

texts = [
    "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.",
    "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.",
    input("Enter custom embedding text:")
]
vector_store.add_texts(texts=texts, batch_size=32)

['7c601881e4bf4c53b5b4c2a25628d904',
 '0442f351aec2415481138315d492ee80',
 'e20a8dcd8b464e8e819b87c9a0ff05c3']

Vector Search Performance Optimization

Now let's demonstrate the performance benefits of different optimization approaches available in Couchbase. We'll compare three optimization levels to show how each contributes to building a production-ready semantic search system:

Baseline (Raw Search): Basic vector similarity search without Hyperscale optimization
Optimized Search: High-performance search using Hyperscale Vector Index
Cache Benefits: Show how caching can be applied on top of any search approach

Important: Caching is orthogonal to index types - you can apply caching benefits to both raw searches and optimized searches to improve repeated query performance.

Understanding Vector Search Results

Before we start our RAG comparisons, let's understand what the search results mean:

When you perform a search query with vector search:

Query Embedding: Your search text is converted into a vector embedding using the Hugging Face model
Vector Similarity Calculation: The system compares your query vector against all stored document vectors
Distance Computation: Using the COSINE distance metric, the system calculates similarity distances
Result Ranking: Documents are ranked by their distance values (lower = more similar)
Post-processing: Results include both the document content and metadata

Note: The returned value represents the vector distance between query and document embeddings. Lower distance values indicate higher similarity.

RAG Search Function

Let's create a comprehensive search function for our RAG performance comparison:

import time

def search_with_performance_metrics(query_text, stage_name, k=3):
    """Perform optimized semantic search with detailed performance metrics"""
    print(f"\n=== {stage_name.upper()} ===")
    print(f"Query: \"{query_text}\"")
    
    start_time = time.time()
    results = vector_store.similarity_search_with_score(query_text, k=k)
    end_time = time.time()
    
    search_time = end_time - start_time
    print(f"Search Time: {search_time:.4f} seconds")
    print(f"Results Found: {len(results)} documents")
    
    for i, (doc, distance) in enumerate(results, 1):
        print(f"\n[Result {i}]")
        print(f"Vector Distance: {distance:.6f} (lower = more similar)")
        # Use the document content directly from search results (no additional KV call needed)
        print(f"Document Content: {doc.page_content}")
        if hasattr(doc, 'metadata') and doc.metadata:
            print(f"Metadata: {doc.metadata}")
    
    return search_time, results

Phase 1: Baseline Performance (Raw Vector Search)

First, let's establish baseline performance with raw vector search - no Hyperscale optimization yet:

test_query = "What are the key features of a scalable NoSQL database?"
print("Testing baseline performance without Hyperscale optimization...")
baseline_time, baseline_results = search_with_performance_metrics(
    test_query, "Phase 1: Baseline Vector Search"
)

Testing baseline performance without Hyperscale optimization...

=== PHASE 1: BASELINE VECTOR SEARCH ===
Query: "What are the key features of a scalable NoSQL database?"
Search Time: 0.1484 seconds
Results Found: 3 documents

[Result 1]
Vector Distance: 0.586197 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.

[Result 2]
Vector Distance: 0.645435 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.

[Result 3]
Vector Distance: 0.976888 (lower = more similar)
Document Content: this is a sample text with the data "hello"

Phase 2: Create Hyperscale Vector Index and Test Performance

Now let's create the Hyperscale Vector Index and measure the performance improvement:

# Create Hyperscale index for optimized vector search
print("Creating Hyperscale Vector Index...")
try:
    vector_store.create_index(
        index_type=IndexType.HYPERSCALE,
        index_description="IVF,SQ8",
        distance_metric=DistanceStrategy.COSINE,
        index_name="huggingface_hyperscale_index",
    )
    print("✓ Hyperscale Vector Index created successfully!")
    
    # Wait for index to become available
    print("Waiting for index to become available...")
    time.sleep(3)
    
except Exception as e:
    if "already exists" in str(e).lower():
        print("✓ Hyperscale Vector Index already exists, proceeding...")
    else:
        print(f"Error creating Hyperscale index: {str(e)}")

# Test the same query with Hyperscale optimization
print("\nTesting performance with Hyperscale optimization...")
optimized_time, optimized_results = search_with_performance_metrics(
    test_query, "Phase 2: Optimized Search"
)

Creating Hyperscale Vector Index...
✓ Hyperscale Vector Index created successfully!
Waiting for index to become available...

Testing performance with Hyperscale optimization...

=== PHASE 2: OPTIMIZED SEARCH ===
Query: "What are the key features of a scalable NoSQL database?"
Search Time: 0.0848 seconds
Results Found: 3 documents

[Result 1]
Vector Distance: 0.586197 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.

[Result 2]
Vector Distance: 0.645435 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.

[Result 3]
Vector Distance: 0.976888 (lower = more similar)
Document Content: this is a sample text with the data "hello"

Phase 3: Demonstrate Cache Benefits

Now let's show how caching can improve performance for repeated queries. Note: Caching benefits apply to both raw searches and optimized searches.

# Set up Couchbase cache (can be applied to any search approach)
print("Setting up Couchbase cache for improved performance on repeated queries...")
cache = CouchbaseCache(
    cluster=cluster,
    bucket_name=couchbase_bucket,
    scope_name=couchbase_scope,
    collection_name=couchbase_collection,
)
set_llm_cache(cache)
print("✓ Couchbase cache enabled!")

# Test cache benefits with the same query (should show improvement on second run)
cache_query = "How does a distributed database handle high-speed operations?"

print("\nTesting cache benefits with a different query...")
print("First execution (cache miss):")
cache_time_1, _ = search_with_performance_metrics(
    cache_query, "Phase 3a: First Query (Cache Miss)", k=2
)

print("\nSecond execution (cache hit):")
cache_time_2, _ = search_with_performance_metrics(
    cache_query, "Phase 3b: Repeated Query (Cache Hit)", k=2
)

Setting up Couchbase cache for improved performance on repeated queries...
✓ Couchbase cache enabled!

Testing cache benefits with a different query...
First execution (cache miss):

=== PHASE 3A: FIRST QUERY (CACHE MISS) ===
Query: "How does a distributed database handle high-speed operations?"
Search Time: 0.1024 seconds
Results Found: 2 documents

[Result 1]
Vector Distance: 0.632770 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.

[Result 2]
Vector Distance: 0.677951 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.

Second execution (cache hit):

=== PHASE 3B: REPEATED QUERY (CACHE HIT) ===
Query: "How does a distributed database handle high-speed operations?"
Search Time: 0.0289 seconds
Results Found: 2 documents

[Result 1]
Vector Distance: 0.632770 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.

[Result 2]
Vector Distance: 0.677951 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.

Complete Performance Analysis

Let's analyze the complete performance improvements across all optimization levels:

print("\n" + "="*80)
print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY")
print("="*80)

print(f"Phase 1 - Baseline (Raw Search):     {baseline_time:.4f} seconds")
print(f"Phase 2 - Optimized Search:          {optimized_time:.4f} seconds")
print(f"Phase 3 - Cache Benefits:")
print(f"  First execution (cache miss):      {cache_time_1:.4f} seconds")
print(f"  Second execution (cache hit):      {cache_time_2:.4f} seconds")

print("\n" + "-"*80)
print("OPTIMIZATION IMPACT ANALYSIS:")
print("-"*80)

# Vector Index improvement analysis
if optimized_time and baseline_time and optimized_time < baseline_time:
    index_speedup = baseline_time / optimized_time
    index_improvement = ((baseline_time - optimized_time) / baseline_time) * 100
    print(f"Vector Index Benefit:   {index_speedup:.2f}x faster ({index_improvement:.1f}% improvement)")
else:
    print(f"Vector Index Benefit:   Performance similar to baseline (may vary with dataset size)")

# Cache improvement analysis
if cache_time_2 and cache_time_1 and cache_time_2 < cache_time_1:
    cache_speedup = cache_time_1 / cache_time_2
    cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100
    print(f"Cache Benefit:          {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)")
else:
    print(f"Cache Benefit:          No significant improvement (results may be cached already)")

print(f"\nKey Insights:")
print(f"• Hyperscale optimization provides consistent performance benefits, especially with larger datasets")
print(f"• Caching benefits apply to both raw and optimized searches")
print(f"• Combined Hyperscale + Cache provides the best performance for production applications")
print(f"• Hyperscale indexes scale to billions of vectors with optimized concurrent operations")

================================================================================
VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY
================================================================================
Phase 1 - Baseline (Raw Search):     0.1484 seconds
Phase 2 - Optimized Search:          0.0848 seconds
Phase 3 - Cache Benefits:
  First execution (cache miss):      0.1024 seconds
  Second execution (cache hit):      0.0289 seconds

--------------------------------------------------------------------------------
OPTIMIZATION IMPACT ANALYSIS:
--------------------------------------------------------------------------------
Vector Index Benefit:   1.75x faster (42.8% improvement)
Cache Benefit:          3.55x faster (71.8% improvement)

Key Insights:
• Hyperscale optimization provides consistent performance benefits, especially with larger datasets
• Caching benefits apply to both raw and optimized searches
• Combined Hyperscale + Cache provides the best performance for production applications
• Hyperscale indexes scale to billions of vectors with optimized concurrent operations

Interactive Testing

Try your own queries with the optimized Hyperscale search system:

custom_query = input("Enter your search query: ")
search_with_performance_metrics(custom_query, "Interactive Optimized Search")

=== INTERACTIVE OPTIMIZED SEARCH ===
Query: "What is the sample data?"
Search Time: 0.0812 seconds
Results Found: 3 documents

[Result 1]
Vector Distance: 0.623644 (lower = more similar)
Document Content: this is a sample text with the data "hello"

[Result 2]
Vector Distance: 0.860599 (lower = more similar)
Document Content: It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.

[Result 3]
Vector Distance: 0.909207 (lower = more similar)
Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.





(0.08118820190429688,
 [(Document(id='e20a8dcd8b464e8e819b87c9a0ff05c3', metadata={}, page_content='this is a sample text with the data "hello"'),
   0.6236441411684932),
  (Document(id='0442f351aec2415481138315d492ee80', metadata={}, page_content='It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.'),
   0.8605992009935179),
  (Document(id='7c601881e4bf4c53b5b4c2a25628d904', metadata={}, page_content='Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.'),
   0.9092065785676496)])

Conclusion

You have successfully built a powerful semantic search engine using Couchbase's Hyperscale and Composite Vector Indexes with Hugging Face embeddings. This guide has walked you through the complete process of creating a high-performance vector search system that can scale to handle billions of documents.

Contents