This tutorial will guide you through building a complete Retrieval-Augmented Generation (RAG) application from scratch using Next.js, the Mastra AI framework, and Couchbase for vector search. We'll start by getting the pre-built application running and then break down how each part works so you can build it yourself.
Mastra is an open-source TypeScript agent framework designed to provide the necessary primitives for building AI applications and features. It allows you to create AI agents with memory, execute functions, and chain LLM calls in deterministic workflows. Here are some key features of Mastra:
This framework is integral to the tutorial, as it powers the AI orchestration and workflow management, enabling the creation of a robust RAG application with Couchbase and Next.js.
First, let's get the completed application running to see what we're building.
Get the project code and install the necessary dependencies.
git clone https://github.com/couchbase-examples/mastra-nextJS-quickstart.git
cd mastra-nextJS-quickstart
npm install
Before running, you need to configure your environment.
Couchbase:
OpenAI:
.env
File:
.env
file in the root of the project..env.sample
(if available) or use the following template and fill in your credentials.# Couchbase Vector Store Configuration
COUCHBASE_CONNECTION_STRING=couchbase://localhost
COUCHBASE_USERNAME=Administrator
COUCHBASE_PASSWORD=your_password
COUCHBASE_BUCKET_NAME=your_bucket
COUCHBASE_SCOPE_NAME=your_scope
COUCHBASE_COLLECTION_NAME=your_collection
# Embedding Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSION=1536
EMBEDDING_BATCH_SIZE=100
# Chunking Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# Vector Index Configuration
VECTOR_INDEX_NAME=document-embeddings
VECTOR_INDEX_METRIC=cosine
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
Start the Next.js development server.
npm run dev
Open your browser to http://localhost:3000
. You should see the PDF upload interface.
Mastra is the AI orchestration framework that powers our application's intelligence. It helps define agents, tools, and workflows for complex AI tasks.
Mastra provides a structured way to build AI applications. Instead of writing scattered functions, you define:
Our core AI component is the researchAgent
. Let's create it in src/mastra/agents/researchAgent.ts
.
// src/mastra/agents/researchAgent.ts
import { Agent } from "@mastra/core/agent";
import { openai } from "@ai-sdk/openai";
import { vectorQueryTool } from "../tools/embed";
import { Memory } from "@mastra/memory";
import { LibSQLStore } from "@mastra/libsql";
export const researchAgent = new Agent({
name: "Research Assistant",
instructions: `You are a helpful research assistant... Base your responses only on the content provided.`,
model: openai("gpt-4o-mini"),
tools: {
vectorQueryTool,
},
memory: new Memory({
storage: new LibSQLStore({
url: "file:../../memory.db"
})
}),
});
Here, we define an Agent
with several key components that work together to create an intelligent research assistant:
Model: Uses OpenAI's gpt-4o-mini
for generating responses. This is a cost-effective model that provides good performance for question-answering tasks while keeping API costs manageable.
Instructions: Clear personality and task guidelines that shape how the agent behaves. These instructions tell the agent to act as a research assistant, focus only on information from the uploaded documents, and always cite sources when providing answers.
Tools: Equipped with vectorQueryTool
to search through your uploaded documents. When a user asks a question, the agent automatically uses this tool to find relevant chunks of text from the PDFs you've uploaded, then bases its answer on that retrieved content.
Memory: Uses LibSQLStore
to remember conversation history across chat sessions. This creates a local SQLite database file (memory.db
) that persists all chat messages, allowing the agent to reference earlier parts of the conversation and maintain context.
The LibSQL database automatically handles storing and retrieving chat messages in the background. This means users can ask follow-up questions like "Can you elaborate on that?" or "What did you say about X earlier?" and the agent will understand what they're referring to from the conversation history.
The agent needs a tool to search for information. We create this in src/mastra/tools/embed.ts
. This tool is responsible for taking a user's query, embedding it, and searching the vector database.
// src/mastra/tools/embed.ts
import { createTool } from "@mastra/core";
import { openai } from "@ai-sdk/openai";
import { embed } from "ai";
import { z } from "zod";
import { getVectorStore } from "./store";
// ... (Environment variable configuration)
export const vectorQueryTool = createTool({
id: "vector_query",
description: "Search for relevant document chunks...",
inputSchema: z.object({
query: z.string().describe("The search query..."),
topK: z.number().optional().default(5),
minScore: z.number().optional().default(0.1),
}),
execute: async (executionContext) => {
const { query, topK, minScore } = executionContext.context;
// Generate embedding for the query
const { embedding: queryEmbedding } = await embed({
model: openai.embedding(EMBEDDING_CONFIG.model),
value: query,
});
// Perform vector search
const vectorStore = getVectorStore();
const results = await vectorStore.query({
indexName: INDEX_CONFIG.indexName,
queryVector: queryEmbedding,
topK,
});
// Filter and format results
const relevantResults = results
.filter(result => result.score >= minScore)
.map(result => ({ /* ... format result ... */ }));
return { /* ... results ... */ };
},
});
This tool leverages the ai
SDK to generate embeddings for user search queries and then utilizes our Couchbase vector store to perform semantic search and retrieve the most relevant text chunks. The process involves:
Query Embedding: The tool takes the user's natural language query and converts it into a high-dimensional vector representation using the same embedding model that was used to process the original documents.
Vector Search: The embedded query is then used to search the Couchbase vector database, which compares the query vector against all stored document embeddings using similarity metrics (cosine similarity, euclidean distance, or dot product).
Result Ranking: Couchbase returns the most similar document chunks ranked by their similarity scores, allowing the tool to identify the most relevant information for the user's query.
Score Filtering: The tool applies a minimum similarity threshold (defaulting to 0.1) to ensure only sufficiently relevant results are returned, filtering out low-quality matches.
This vector search capability is what enables the RAG system to ground AI responses in specific, relevant information from the uploaded documents rather than relying solely on the model's pre-trained knowledge.
Couchbase serves as our high-performance vector database, storing the document embeddings and allowing for fast semantic search.
Couchbase is an excellent choice for RAG applications because it combines a powerful, scalable NoSQL database with integrated vector search capabilities. This means you can store your unstructured metadata and structured vector embeddings in the same place, simplifying your architecture.
We need a way to manage the connection to Couchbase. A singleton pattern is perfect for this, ensuring we don't create unnecessary connections. We'll write this in src/mastra/tools/store.ts
.
// src/mastra/tools/store.ts
import { CouchbaseVector } from "@mastra/couchbase";
function createCouchbaseConnection(): CouchbaseVector {
// ... reads environment variables ...
return new CouchbaseVector({ /* ... connection config ... */ });
}
let vectorStoreInstance: CouchbaseVector | null = null;
export function getVectorStore(): CouchbaseVector {
if (!vectorStoreInstance) {
vectorStoreInstance = createCouchbaseConnection();
}
return vectorStoreInstance;
}
The @mastra/couchbase
package provides the CouchbaseVector
class, which handles all the complexities of interacting with Couchbase for vector operations.
A key feature of our application is that it automatically creates the necessary vector search index if it doesn't already exist. This logic resides in our PDF ingestion API route.
// src/app/api/ingestPdf/route.ts
// Inside createDocumentEmbeddings function:
const vectorStore = connectToCouchbase();
try {
await vectorStore.createIndex({
indexName: INDEX_CONFIG.indexName,
dimension: EMBEDDING_CONFIG.dimension,
metric: INDEX_CONFIG.metric,
});
console.info('Successfully created search index');
} catch (error) {
// Continue anyway - index might already exist
console.warn(`Index creation warning: ${error}`);
}
This ensures that the application is ready for vector search as soon as the first document is uploaded, without any manual setup required in Couchbase.
Now let's connect everything to build the full Retrieval-Augmented Generation pipeline.
The ingestion process is handled by the /api/ingestPdf
API route. Here's the step-by-step flow defined in src/app/api/ingestPdf/route.ts
:
POST
handler receives the uploaded PDF from the frontend.public/assets
directory.pdf-parse
library to extract raw text from the PDF buffer.MDocument
utility. This is crucial for providing focused context to the AI.
const doc = MDocument.fromText(documentText);
const chunks = await doc.chunk({ /* ... chunking config ... */ });
embedMany
.vectorStore.upsert()
.When a user sends a message in the chat, the /api/chat
route takes over.
POST
handler in src/app/api/chat/route.ts
receives the user's message history.researchAgent
with the conversation history.
// src/app/api/chat/route.ts
const stream = await researchAgent.stream(matraMessages);
researchAgent
, guided by its instructions, determines that it needs to find information and calls its vectorQueryTool
.gpt-4o-mini
, which generates a response based only on the provided information.The frontend is a Next.js application using React Server Components and Client Components for a modern, responsive user experience.
The file upload functionality is handled by src/components/PDFUploader.tsx
.
react-dropzone
library to provide a drag-and-drop interface.POST
request with FormData
to our /api/ingestPdf
endpoint.// src/components/PDFUploader.tsx
const uploadPdf = async (event) => {
// ...
const data = new FormData();
data.set("file", selectedFile);
const response = await fetch("/api/ingestPdf", {
method: "POST",
body: data,
});
const jsonResp = await response.json();
router.push("/chatPage" + "?" + createQueryString("fileName", jsonResp.fileName));
};
The chat interface is composed of several components found in src/components/chatPage/
. The main logic for handling the conversation would typically use the useChat
hook from the ai
package to manage message state, user input, and the streaming response from the /api/chat
endpoint.
This separation of concerns allows the backend to focus on the heavy lifting of AI and data processing, while the frontend focuses on delivering a smooth user experience.
You can extend the readDocument
function in ingestPdf/route.ts
to support other file types like .docx
or .txt
by using different parsing libraries.
Explore more of Mastra's capabilities by creating multi-agent workflows, adding more custom tools (e.g., a tool to perform web searches), or implementing more sophisticated memory strategies.
Improve retrieval by experimenting with hybrid search (combining vector search with traditional keyword search), filtering by metadata, or implementing more advanced chunking and embedding strategies.
Congratulations! You've now seen how to build a powerful RAG application with Mastra, Couchbase, and Next.js. Use this foundation to build your own custom AI-powered solutions.