Building a RAG System from Scratch with pgvector and Gemini — Introduction
Building a RAG System from Scratch with pgvector and Gemini — Introduction
从零构建 RAG 系统:pgvector 与 Gemini 实战指南(一)—— 引言
What This Guide Covers
本指南涵盖内容
When you start building LLM-powered applications, one pattern becomes unavoidable: RAG (Retrieval-Augmented Generation). LLMs only know what they were trained on. Your company’s internal documents, the latest spec sheets, project-specific information — none of that exists in the model. To handle data the model doesn’t know, you need a system that retrieves relevant knowledge in real time and injects it into the context. That’s RAG. 当你开始构建大模型(LLM)应用时,有一个模式是绕不开的:RAG(检索增强生成)。大模型只了解它们训练过的内容。你公司的内部文档、最新的规格说明书、项目特定的信息——这些模型统统不知道。为了处理模型未知的知识,你需要一个能够实时检索相关信息并将其注入上下文的系统,这就是 RAG。
In this guide, we’ll implement a RAG system from scratch using pgvector and Gemini, then extend it step by step through Tool Use, AI Agents, MCP, and cloud deployment. 在本指南中,我们将使用 pgvector 和 Gemini 从零开始实现一个 RAG 系统,并逐步通过工具调用(Tool Use)、AI Agent、MCP 以及云端部署来扩展它。
- Step 1: Embedding · Vector DB · RAG — core implementation
- Step 2: AI Architect perspective — design decisions explained
- Step 3: Tool Use — LLM autonomously searches the DB
- Step 4: AI Agents — combining multiple tools
- Step 5: MCP — exposing tools as a server
- Step 6: Cloud deployment — Render × Supabase
- 第一步: Embedding · 向量数据库 · RAG —— 核心实现
- 第二步: AI 架构师视角 —— 设计决策解析
- 第三步: 工具调用 —— 让 LLM 自主搜索数据库
- 第四步: AI Agent —— 组合多种工具
- 第五步: MCP —— 将工具封装为服务器
- 第六步: 云端部署 —— Render × Supabase
Three Concepts to Understand First
首先需要理解的三个概念
Embedding
嵌入(Embedding)
Computers can’t measure “semantic similarity” from raw text. Embedding converts text into a list of numbers (a vector), and semantically similar words produce numerically similar patterns. 计算机无法直接从原始文本中衡量“语义相似度”。Embedding 将文本转换为一串数字(向量),语义相似的词会产生数值上相似的模式。
- “dog” → [0.82, 0.75, 0.10, …] 768 numbers
- “cat” → [0.78, 0.72, 0.12, …] ← similar pattern to “dog”
- “bank” → [0.08, 0.10, 0.85, …] ← completely different
Gemini’s embedding model handles this conversion. Gemini 的嵌入模型负责处理这种转换。
Vector DB
向量数据库(Vector DB)
A regular DB searches by keyword matching. A vector DB searches by numeric distance — meaning it finds semantically related documents even when the exact words don’t match. 常规数据库通过关键词匹配进行搜索。向量数据库则通过数值距离进行搜索——这意味着即使没有完全匹配的词,它也能找到语义相关的文档。
- Regular search (misses if keywords don’t match):
SELECT * FROM docs WHERE body LIKE '%F1 score%'; - Vector search (finds semantically related docs):
SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3; - 常规搜索(关键词不匹配则无法命中):
SELECT * FROM docs WHERE body LIKE '%F1 score%'; - 向量搜索(找到语义相关的文档):
SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;
Search for “how to measure model performance” and it finds “F1 score calculation” — even without matching words. We use pgvector, a PostgreSQL extension, for this. 搜索“如何衡量模型性能”,它能找到“F1 分数计算”——即使没有匹配的词。我们使用 PostgreSQL 的扩展插件 pgvector 来实现这一点。
RAG
RAG(检索增强生成)
LLMs are limited to their training data. RAG is a design pattern that retrieves relevant documents and passes them to the LLM as context, enabling the model to answer questions about data it has never seen. 大模型受限于其训练数据。RAG 是一种设计模式,它通过检索相关文档并将其作为上下文传递给 LLM,使模型能够回答关于它从未见过的数据的问题。
- [Plain LLM] question → answers from training data only
- [RAG] question → search Vector DB → pass results to LLM → grounded answer
- [普通 LLM] 问题 → 仅从训练数据中回答
- [RAG] 问题 → 搜索向量数据库 → 将结果传递给 LLM → 基于事实的回答
Who This Is For
目标读者
- Engineers with Python experience who are new to AI application development
- Anyone who wants to understand RAG, Embedding, and vector search through code
- Anyone who wants to learn hands-on from local implementation to cloud deployment
- 有 Python 经验但刚接触 AI 应用开发的工程师
- 想要通过代码理解 RAG、Embedding 和向量搜索的开发者
- 想要从本地实现到云端部署进行实战学习的任何人
Tools Used (All Free)
使用工具(全部免费)
| Tool | Purpose | Free Tier |
|---|---|---|
| Google Gemini API | Embedding generation · answer generation | 1,500 requests/day |
| pgvector (PostgreSQL extension) | Vector storage · search | Unlimited (local) |
| Docker | Run pgvector locally | Unlimited |
| Python 3.12 | Implementation language | - |
| Render | Deploy MCP server | Free web service (with sleep) |
| Supabase Cloud | pgvector | 500MB persistent free |
| 工具 | 用途 | 免费额度 |
|---|---|---|
| Google Gemini API | 嵌入生成 · 回答生成 | 1,500 次请求/天 |
| pgvector (PostgreSQL 扩展) | 向量存储 · 搜索 | 无限制(本地) |
| Docker | 本地运行 pgvector | 无限制 |
| Python 3.12 | 开发语言 | - |
| Render | 部署 MCP 服务器 | 免费 Web 服务(带休眠) |
| Supabase Cloud | pgvector | 500MB 永久免费 |
Where This Fits in the AI Architect Roadmap
在 AI 架构师路线图中的位置
This guide focuses on the Applied and Design phases — the first big implementation step after learning the fundamentals (LLM basics, Prompt Engineering, API/SDK usage). 本指南专注于“应用”与“设计”阶段——这是在掌握基础知识(LLM 基础、提示词工程、API/SDK 使用)之后,迈出的第一个重大实现步骤。
| Topic | What we implement |
|---|---|
| ✓ RAG | Full RAG pipeline with pgvector and Gemini |
| ✓ Embedding | Text-to-vector conversion with Gemini Embedding API |
| ✓ Vector DB | Cosine similarity search with pgvector |
| 主题 | 实现内容 |
|---|---|
| ✓ RAG | 基于 pgvector 和 Gemini 的完整 RAG 流水线 |
| ✓ Embedding | 使用 Gemini Embedding API 进行文本转向量 |
| ✓ Vector DB | 使用 pgvector 进行余弦相似度搜索 |
Let’s get started in the next article with environment setup and the first implementation. 下一篇文章我们将开始环境配置和首次实现,敬请期待。
Series Index
系列索引
- Introduction (this article) —— 引言(本文)
- RAG · Embedding · Vector DB Implementation —— RAG · Embedding · 向量数据库实现
- Reading RAG Design from an AI Architect’s Perspective —— 从 AI 架构师视角解读 RAG 设计
- Tool Use — Letting the LLM Search Autonomously —— 工具调用:让 LLM 自主搜索
- AI Agents — Combining Multiple Tools —— AI Agent:组合多种工具
- MCP — Exposing pgvector Search as an MCP Server —— MCP:将 pgvector 搜索封装为 MCP 服务器
- Cloud Deployment — Render × Supabase —— 云端部署:Render × Supabase
- Wrap-up and Next Steps —— 总结与后续步骤
Source code: github.com/qameqame/pgvector-tutorial