Building a RAG System from Scratch with pgvector and Gemini — Introduction

从零构建 RAG 系统：pgvector 与 Gemini 实战指南（一）—— 引言

What This Guide Covers

本指南涵盖内容

When you start building LLM-powered applications, one pattern becomes unavoidable: RAG (Retrieval-Augmented Generation). LLMs only know what they were trained on. Your company’s internal documents, the latest spec sheets, project-specific information — none of that exists in the model. To handle data the model doesn’t know, you need a system that retrieves relevant knowledge in real time and injects it into the context. That’s RAG. 当你开始构建大模型（LLM）应用时，有一个模式是绕不开的：RAG（检索增强生成）。大模型只了解它们训练过的内容。你公司的内部文档、最新的规格说明书、项目特定的信息——这些模型统统不知道。为了处理模型未知的知识，你需要一个能够实时检索相关信息并将其注入上下文的系统，这就是 RAG。

In this guide, we’ll implement a RAG system from scratch using pgvector and Gemini, then extend it step by step through Tool Use, AI Agents, MCP, and cloud deployment. 在本指南中，我们将使用 pgvector 和 Gemini 从零开始实现一个 RAG 系统，并逐步通过工具调用（Tool Use）、AI Agent、MCP 以及云端部署来扩展它。

Step 1: Embedding · Vector DB · RAG — core implementation
Step 2: AI Architect perspective — design decisions explained
Step 3: Tool Use — LLM autonomously searches the DB
Step 4: AI Agents — combining multiple tools
Step 5: MCP — exposing tools as a server
Step 6: Cloud deployment — Render × Supabase
第一步： Embedding · 向量数据库 · RAG —— 核心实现
第二步： AI 架构师视角 —— 设计决策解析
第三步： 工具调用 —— 让 LLM 自主搜索数据库
第四步： AI Agent —— 组合多种工具
第五步： MCP —— 将工具封装为服务器
第六步： 云端部署 —— Render × Supabase

Three Concepts to Understand First

首先需要理解的三个概念

Embedding

嵌入（Embedding）

Computers can’t measure “semantic similarity” from raw text. Embedding converts text into a list of numbers (a vector), and semantically similar words produce numerically similar patterns. 计算机无法直接从原始文本中衡量“语义相似度”。Embedding 将文本转换为一串数字（向量），语义相似的词会产生数值上相似的模式。

“dog” → [0.82, 0.75, 0.10, …] 768 numbers
“cat” → [0.78, 0.72, 0.12, …] ← similar pattern to “dog”
“bank” → [0.08, 0.10, 0.85, …] ← completely different

Gemini’s embedding model handles this conversion. Gemini 的嵌入模型负责处理这种转换。

Vector DB

向量数据库（Vector DB）

A regular DB searches by keyword matching. A vector DB searches by numeric distance — meaning it finds semantically related documents even when the exact words don’t match. 常规数据库通过关键词匹配进行搜索。向量数据库则通过数值距离进行搜索——这意味着即使没有完全匹配的词，它也能找到语义相关的文档。

Regular search (misses if keywords don’t match): SELECT * FROM docs WHERE body LIKE '%F1 score%';
Vector search (finds semantically related docs): SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;
常规搜索（关键词不匹配则无法命中）： SELECT * FROM docs WHERE body LIKE '%F1 score%';
向量搜索（找到语义相关的文档）： SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;

Search for “how to measure model performance” and it finds “F1 score calculation” — even without matching words. We use pgvector, a PostgreSQL extension, for this. 搜索“如何衡量模型性能”，它能找到“F1 分数计算”——即使没有匹配的词。我们使用 PostgreSQL 的扩展插件 pgvector 来实现这一点。

RAG

RAG（检索增强生成）

LLMs are limited to their training data. RAG is a design pattern that retrieves relevant documents and passes them to the LLM as context, enabling the model to answer questions about data it has never seen. 大模型受限于其训练数据。RAG 是一种设计模式，它通过检索相关文档并将其作为上下文传递给 LLM，使模型能够回答关于它从未见过的数据的问题。

[Plain LLM] question → answers from training data only
[RAG] question → search Vector DB → pass results to LLM → grounded answer
[普通 LLM] 问题 → 仅从训练数据中回答
[RAG] 问题 → 搜索向量数据库 → 将结果传递给 LLM → 基于事实的回答

Who This Is For

目标读者

Engineers with Python experience who are new to AI application development
Anyone who wants to understand RAG, Embedding, and vector search through code
Anyone who wants to learn hands-on from local implementation to cloud deployment
有 Python 经验但刚接触 AI 应用开发的工程师
想要通过代码理解 RAG、Embedding 和向量搜索的开发者
想要从本地实现到云端部署进行实战学习的任何人

Tools Used (All Free)

使用工具（全部免费）

Tool	Purpose	Free Tier
Google Gemini API	Embedding generation · answer generation	1,500 requests/day
pgvector (PostgreSQL extension)	Vector storage · search	Unlimited (local)
Docker	Run pgvector locally	Unlimited
Python 3.12	Implementation language	-
Render	Deploy MCP server	Free web service (with sleep)
Supabase Cloud	pgvector	500MB persistent free

工具	用途	免费额度
Google Gemini API	嵌入生成 · 回答生成	1,500 次请求/天
pgvector (PostgreSQL 扩展)	向量存储 · 搜索	无限制（本地）
Docker	本地运行 pgvector	无限制
Python 3.12	开发语言	-
Render	部署 MCP 服务器	免费 Web 服务（带休眠）
Supabase Cloud	pgvector	500MB 永久免费

Where This Fits in the AI Architect Roadmap

在 AI 架构师路线图中的位置

This guide focuses on the Applied and Design phases — the first big implementation step after learning the fundamentals (LLM basics, Prompt Engineering, API/SDK usage). 本指南专注于“应用”与“设计”阶段——这是在掌握基础知识（LLM 基础、提示词工程、API/SDK 使用）之后，迈出的第一个重大实现步骤。

Topic	What we implement
✓ RAG	Full RAG pipeline with pgvector and Gemini
✓ Embedding	Text-to-vector conversion with Gemini Embedding API
✓ Vector DB	Cosine similarity search with pgvector

主题	实现内容
✓ RAG	基于 pgvector 和 Gemini 的完整 RAG 流水线
✓ Embedding	使用 Gemini Embedding API 进行文本转向量
✓ Vector DB	使用 pgvector 进行余弦相似度搜索

Let’s get started in the next article with environment setup and the first implementation. 下一篇文章我们将开始环境配置和首次实现，敬请期待。

Series Index

系列索引

Introduction (this article) —— 引言（本文）
RAG · Embedding · Vector DB Implementation —— RAG · Embedding · 向量数据库实现
Reading RAG Design from an AI Architect’s Perspective —— 从 AI 架构师视角解读 RAG 设计
Tool Use — Letting the LLM Search Autonomously —— 工具调用：让 LLM 自主搜索
AI Agents — Combining Multiple Tools —— AI Agent：组合多种工具
MCP — Exposing pgvector Search as an MCP Server —— MCP：将 pgvector 搜索封装为 MCP 服务器
Cloud Deployment — Render × Supabase —— 云端部署：Render × Supabase
Wrap-up and Next Steps —— 总结与后续步骤

Source code: github.com/qameqame/pgvector-tutorial