Welcome to this comprehensive guide on constructing an AI-enhanced Chat Application using Couchbase's Hyperscale Vector Index. We will create a dynamic chat interface capable of delving into PDF documents to extract and provide summaries, key facts, and answers to your queries. By the end of this tutorial, you'll have a powerful tool at your disposal, transforming the way you interact with and utilize the information contained within PDFs.
Couchbase supports multiple types of vector indexes for different use cases. This tutorial uses Hyperscale Vector Index, but you should be aware of the alternatives:
Hyperscale Vector Index (used in this tutorial):
Composite Vector Index:
Search Vector Index:
For this PDF chat demo, we use Hyperscale Vector Index for optimal performance in pure RAG applications. If your use case requires filtering by metadata (e.g., searching only in specific user's documents, or documents from a certain date range), consider using Composite Vector Index instead.
To learn more about choosing the right vector index, refer to the official Couchbase vector index documentation.
This tutorial will demonstrate how to:
Note that this tutorial is designed to work with the latest Couchbase-haystack SDK version (2.1.0+). It will not work with the older versions.
Important: This tutorial uses Hyperscale vector index, which are optimized for high-performance vector search at scale. For Couchbase 7.6+, consider using the Search Vector Index based approach instead.
git clone https://github.com/couchbase-examples/haystack-demo.gitAny dependencies should be installed through pip, the default package manager for Python. You may use virtual environment as well.
python -m pip install -r requirements.txtTo know more about connecting to your Capella cluster, please follow the instructions.
Specifically, you need to do the following:
sample-bucket. We will use the pdf_data scope and pdf_chunks collection of this bucket.Good news! The application automatically creates the Hyperscale vector index after you upload your first PDF. This is because Hyperscale indexes require training data (documents with embeddings) to be created successfully.
The application will:
No manual index creation required! The index is created with the following configuration:
CREATE VECTOR INDEX idx_<collection_name>_vector
ON `<bucket_name>`.`<scope_name>`.`<collection_name>`(embedding VECTOR)
WITH {
"dimension": 1536,
"similarity": "DOT"
}embedding with 1536 dimensions (matching OpenAI's text-embedding-ada-002/003 models)DOT for dot product similarity (optimized for OpenAI embeddings)The application uses CouchbaseQueryDocumentStore which leverages SQL++ queries with the APPROX_VECTOR_DISTANCE() function for fast approximate nearest neighbor (ANN) search.
Note: If automatic creation fails, the application will attempt to create the index on first query as a fallback. If you prefer manual control, you can create the index yourself using the SQL++ query above after uploading documents.
If your application needs to filter documents by metadata before performing vector search, you can use a Composite Vector Index instead. The same code works with minimal changes!
Creating a Composite Vector Index:
CREATE INDEX idx_<collection_name>_composite
ON `<bucket_name>`.`<scope_name>`.`<collection_name>`(embedding VECTOR)
WITH {
"dimension": 1536,
"similarity": "DOT"
}Key Difference: Notice the keyword is INDEX (not VECTOR INDEX). Composite indexes combine vector fields with other scalar fields.
When to Use Composite Vector Index:
user_id, date, category) before vector searchCode Compatibility:
The same CouchbaseQueryDocumentStore and application code works with both Hyperscale and Composite indexes! The only difference is:
To learn more about Composite Vector Indexes, refer to the official documentation.
Copy the secrets.example.toml file in .streamlit folder and rename it to secrets.toml and replace the placeholders with the actual values for your environment. All configuration for communication with the database is read from the environment variables.
DB_CONN_STR = "<couchbase_cluster_connection_string>"
DB_USERNAME = "<couchbase_username>"
DB_PASSWORD = "<couchbase_password>"
DB_BUCKET = "<bucket_name>"
DB_SCOPE = "<scope_name>"
DB_COLLECTION = "<collection_name>"
OPENAI_API_KEY = "<openai_api_key>"OpenAI API Key is required for usage in generating embedding and querying LLM.
The connection string expects the
couchbases://orcouchbase://part.
For this tutorial,
DB_BUCKET = sample-bucket,DB_SCOPE = pdf_data,DB_COLLECTION = pdf_chunks.
After starting Couchbase server and installing dependencies, our application is ready to run. The application will automatically create the scope and collection on startup.
In the projects root directory, run the following command
streamlit run chat_with_pdf_with_query_vector_index.pyThe application will run on your local machine at http://localhost:8501.
Important: Upload a PDF first! The Hyperscale vector index will be automatically created after you upload and process your first PDF document. The index creation requires training data (documents with embeddings) to work properly.
On the left sidebar, you'll find an option to upload a PDF document you want to use with this PDF Chat App. Depending on the size of the PDF, the upload process may take some time.
In the main area, there's a chat screen where you can ask questions about the uploaded PDF document. You will receive two responses: one with context from the PDF (Couchbase Logo - ) , and one without the PDF context (Bot Logo - 🤖). This demonstrates how the Retrieval Augmented Generation (RAG) model enhances the answers provided by the language model using the PDF content.
The PDF Chat application leverages two powerful concepts: Retrieval-Augmented Generation (RAG) and Vector Search. Together, these techniques enable efficient and context-aware interactions with PDF documents.
RAG is like having two helpers:
Here's how RAG works:
This enhances the context from PDF and LLM is able to give relevant results from the PDF rather than giving generalized results.
Couchbase is a NoSQL database that provides a powerful Hyperscale Vector Search capability. It allows you to store and search through high-dimensional vector representations (embeddings) of textual data at massive scale.
The PDF Chat app uses Haystack to convert the text from the PDF documents into embeddings. These embeddings are then stored in a Couchbase bucket, along with the corresponding text.
When a user asks a question or provides a prompt:
APPROX_VECTOR_DISTANCE() function and the user's query embedding. Couchbase's Hyperscale Vector Search calculates the similarity (e.g., dot product) between the query embedding and the indexed PDF embeddings using approximate nearest neighbor (ANN) search, enabling ultra-fast retrieval.CouchbaseQueryDocumentStore class, abstracting away the complexities of vector similarity calculations.Haystack is a powerful library that simplifies the process of building applications with large language models (LLMs) and vector stores like Couchbase.
In the PDF Chat app, Haystack is used for several tasks:
By combining Vector Search with Couchbase, RAG, and Haystack, the PDF Chat app can efficiently ingest PDF documents, convert their content into searchable embeddings, retrieve relevant information based on user queries and conversation context, and generate context-aware and informative responses using large language models.
To begin this tutorial, clone the repo and open it up in the IDE of your choice. Now you can learn how to create the PDF Chat App. The whole code is written in chat_with_pdf_with_query_vector_index.py file.
The fundamental workflow of the application is as follows: The user initiates the process from the Main Page's sidebar by uploading a PDF. This action triggers the save_to_vector_store function, which subsequently uploads the PDF into the Couchbase vector store. Following this, the user can now chat with the LLM.
On the Chat Area, the user can pose questions. These inquiries are processed by the Chat API, which consults the LLM for responses, aided by the context provided by RAG. The assistant then delivers the answer, and the user has the option to ask additional questions.
The first step is connecting to Couchbase. Couchbase Hyperscale Vector Search is required for PDF Upload as well as during chat (For Retrieval). We will use the Haystack CouchbaseQueryDocumentStore to connect to the Couchbase cluster with Hyperscale vector search support. The connection is established in the get_document_store function.
Note: The same
CouchbaseQueryDocumentStoreconfiguration works with both Hyperscale and Composite vector indexes! You only need to change the SQL++ statement when creating the index.
The connection string and credentials are read from the environment variables. We perform some basic required checks for the environment variable not being set in the secrets.toml, and then proceed to connect to the Couchbase cluster. We connect to the cluster using connect method.
from couchbase_haystack import (
CouchbaseQueryDocumentStore,
CouchbasePasswordAuthenticator,
CouchbaseClusterOptions,
QueryVectorSearchType,
QueryVectorSearchSimilarity,
CouchbaseQueryOptions
)
from couchbase.n1ql import QueryScanConsistency
from datetime import timedelta
@st.cache_resource(show_spinner="Connecting to Vector Store")
def get_document_store():
"""Return the Couchbase document store using CouchbaseQueryDocumentStore."""
return CouchbaseQueryDocumentStore(
cluster_connection_string=Secret.from_env_var("DB_CONN_STR"),
authenticator=CouchbasePasswordAuthenticator(
username=Secret.from_env_var("DB_USERNAME"),
password=Secret.from_env_var("DB_PASSWORD")
),
cluster_options=CouchbaseClusterOptions(profile='wan_development'),
bucket=os.getenv("DB_BUCKET"),
scope=os.getenv("DB_SCOPE"),
collection=os.getenv("DB_COLLECTION"),
search_type=QueryVectorSearchType.ANN,
similarity=QueryVectorSearchSimilarity.DOT,
nprobes=10,
query_options=CouchbaseQueryOptions(
timeout=timedelta(seconds=60),
scan_consistency=QueryScanConsistency.NOT_BOUNDED
)
)We will define the bucket, scope, and collection names from Environment Variables.
Configuration Parameters:
search_type=QueryVectorSearchType.ANN: Uses approximate nearest neighbor search for fast retrievalsimilarity=QueryVectorSearchSimilarity.DOT: Uses dot product similarity (optimized for OpenAI embeddings)nprobes=10: Controls the search accuracy/speed tradeoff (higher = more accurate but slower)query_options: Configures query timeout and consistency settingsWe will now initialize the CouchbaseQueryDocumentStore which will be used for storing and retrieving document embeddings.
# Initialize document store
document_store = get_document_store()The save_to_vector_store function takes care of uploading the PDF file in vector format to the Couchbase Database using the Haystack indexing pipeline. It converts the PDF to documents, cleans them, splits text into small chunks, generates embeddings for those chunks, and ingests the chunks and their embeddings into the Couchbase vector store.
This part of code creates a file uploader on sidebar using Streamlit library. After PDF is uploaded, save_to_vector_store function is called to further process the PDF.
with st.form("upload pdf"):
uploaded_file = st.file_uploader("Choose a PDF.", help="The document will be deleted after one hour of inactivity (TTL).", type="pdf")
submitted = st.form_submit_button("Upload")
if submitted:
save_to_vector_store(uploaded_file, indexing_pipeline)The save_to_vector_store function ensures that the uploaded PDF file is properly handled, loaded, and prepared for storage in the vector store. It first checks if a file was actually uploaded. Then the uploaded file is saved to a temporary file.
The indexing pipeline takes care of converting the PDF to documents, cleaning them, splitting them into chunks, generating embeddings, and storing them in the Couchbase vector store.
def save_to_vector_store(uploaded_file, indexing_pipeline):
"""Process the PDF & store it in Couchbase Vector Store"""
if uploaded_file is not None:
temp_dir = tempfile.TemporaryDirectory()
temp_file_path = os.path.join(temp_dir.name, uploaded_file.name)
with open(temp_file_path, "wb") as f:
f.write(uploaded_file.getvalue())
result = indexing_pipeline.run({"converter": {"sources": [temp_file_path]}})
st.info(f"PDF loaded into vector store: {result['writer']['documents_written']} documents indexed")The indexing pipeline is created to handle the entire process of ingesting PDFs into the vector store. It includes the following components:
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder, AnswerBuilder
from haystack.components.writers import DocumentWriter
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", PyPDFToDocument())
indexing_pipeline.add_component("cleaner", DocumentCleaner())
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="word", split_length=250, split_overlap=50))
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("converter.documents", "cleaner.documents")
indexing_pipeline.connect("cleaner.documents", "splitter.documents")
indexing_pipeline.connect("splitter.documents", "embedder.documents")
indexing_pipeline.connect("embedder.documents", "writer.documents")After the documents are written to Couchbase, the application automatically creates the Hyperscale vector index using SQL++.
After uploading the PDF into Couchbase, we are now ready to utilize the power of Couchbase Vector Search, RAG and LLM to get context based answers to our questions. When the user asks a question. The assistant (LLM) is called here with RAG context, the response from the assistant is sent back to the user.
We create a RAG (Retrieval-Augmented Generation) pipeline using Haystack components. This pipeline handles the entire process of retrieving relevant context from the vector store and generating responses using the LLM.
The OpenAIGenerator is a crucial component in our RAG pipeline, responsible for generating human-like responses based on the retrieved context and user questions. Here's a more detailed explanation of its configuration and role:
By using the OpenAIGenerator, we leverage state-of-the-art natural language processing capabilities to provide accurate, context-aware answers to user queries about the uploaded PDF content.
from haystack.components.embedders import OpenAITextEmbedder
from couchbase_haystack import CouchbaseQueryEmbeddingRetriever
from haystack.components.builders import PromptBuilder, AnswerBuilder
rag_pipeline = Pipeline()
rag_pipeline.add_component("query_embedder", OpenAITextEmbedder())
rag_pipeline.add_component("retriever", CouchbaseQueryEmbeddingRetriever(document_store=document_store))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template="""
You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Question: {{question}}
"""))
rag_pipeline.add_component(
"llm",
OpenAIGenerator(
api_key=OPENAI_API_KEY,
model="gpt-5",
),
)
rag_pipeline.add_component("answer_builder", AnswerBuilder())
rag_pipeline.connect("query_embedder", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
rag_pipeline.connect("llm.replies", "answer_builder.replies")
rag_pipeline.connect("llm.meta", "answer_builder.meta")
rag_pipeline.connect("retriever", "answer_builder.documents")Key Point: We use
CouchbaseQueryEmbeddingRetrieverwhich queries the Hyperscale vector index using SQL++ with theAPPROX_VECTOR_DISTANCE()function.
This section creates an interactive chat interface where users can ask questions based on the uploaded PDF. The key steps are:
if question := st.chat_input("Ask a question based on the PDF"):
st.chat_message("user").markdown(question)
st.session_state.messages.append({"role": "user", "content": question, "avatar": "👤"})
# RAG response
with st.chat_message("assistant", avatar=couchbase_logo):
message_placeholder = st.empty()
rag_result = rag_pipeline.run(
{
"query_embedder": {"text": question},
"retriever": {"top_k": 3},
"prompt_builder": {"question": question},
"answer_builder": {"query": question},
}
)
rag_response = rag_result["answer_builder"]["answers"][0].data
message_placeholder.markdown(rag_response)
st.session_state.messages.append({"role": "assistant", "content": rag_response, "avatar": couchbase_logo})
This setup allows users to have a conversational experience, asking questions related to the uploaded PDF, with responses generated by the RAG pipeline. Both the user's questions and the assistant's responses are displayed in the chat interface, along with their respective roles and avatars.
Similar to last section, we will get answer from LLM of the user question. Answers from here are also shown in the UI to showcase difference on how using RAG gives better and more context enabled results.
# stream the response from the pure LLM
# Add placeholder for streaming the response
with st.chat_message("ai", avatar="🤖"):
message_placeholder_pure_llm = st.empty()
pure_llm_result = rag_pipeline.run(
{
"prompt_builder": {"question": question},
"llm": {},
"answer_builder": {"query": question},
"query_embedder": {"text": question}
}
)
pure_llm_response = pure_llm_result["answer_builder"]["answers"][0].data
message_placeholder_pure_llm.markdown(pure_llm_response)
st.session_state.messages.append({"role": "assistant", "content": pure_llm_response, "avatar": "🤖"})Congratulations! You've successfully built a PDF Chat application using Couchbase Hyperscale Vector Index, Haystack, and OpenAI. This application demonstrates the power of combining high-performance vector search with large language models to create an intelligent, context-aware chat interface for PDF documents.
Through this tutorial, you've learned how to:
We hope this tutorial has given you a solid foundation for building AI-powered document interaction systems with Couchbase Hyperscale Vector Index and Haystack. Happy coding!