Building a RAG System from Scratch with pgvector and Gemini — Introduction

Building a RAG System from Scratch with pgvector and Gemini — Introduction

从零构建 RAG 系统:pgvector 与 Gemini 实战指南(一)—— 引言

What This Guide Covers

本指南涵盖内容

When you start building LLM-powered applications, one pattern becomes unavoidable: RAG (Retrieval-Augmented Generation). LLMs only know what they were trained on. Your company’s internal documents, the latest spec sheets, project-specific information — none of that exists in the model. To handle data the model doesn’t know, you need a system that retrieves relevant knowledge in real time and injects it into the context. That’s RAG. 当你开始构建大模型(LLM)应用时,有一个模式是绕不开的:RAG(检索增强生成)。大模型只了解它们训练过的内容。你公司的内部文档、最新的规格说明书、项目特定的信息——这些模型统统不知道。为了处理模型未知的知识,你需要一个能够实时检索相关信息并将其注入上下文的系统,这就是 RAG。

In this guide, we’ll implement a RAG system from scratch using pgvector and Gemini, then extend it step by step through Tool Use, AI Agents, MCP, and cloud deployment. 在本指南中,我们将使用 pgvector 和 Gemini 从零开始实现一个 RAG 系统,并逐步通过工具调用(Tool Use)、AI Agent、MCP 以及云端部署来扩展它。

  • Step 1: Embedding · Vector DB · RAG — core implementation
  • Step 2: AI Architect perspective — design decisions explained
  • Step 3: Tool Use — LLM autonomously searches the DB
  • Step 4: AI Agents — combining multiple tools
  • Step 5: MCP — exposing tools as a server
  • Step 6: Cloud deployment — Render × Supabase
  • 第一步: Embedding · 向量数据库 · RAG —— 核心实现
  • 第二步: AI 架构师视角 —— 设计决策解析
  • 第三步: 工具调用 —— 让 LLM 自主搜索数据库
  • 第四步: AI Agent —— 组合多种工具
  • 第五步: MCP —— 将工具封装为服务器
  • 第六步: 云端部署 —— Render × Supabase

Three Concepts to Understand First

首先需要理解的三个概念

Embedding

嵌入(Embedding)

Computers can’t measure “semantic similarity” from raw text. Embedding converts text into a list of numbers (a vector), and semantically similar words produce numerically similar patterns. 计算机无法直接从原始文本中衡量“语义相似度”。Embedding 将文本转换为一串数字(向量),语义相似的词会产生数值上相似的模式。

  • “dog” → [0.82, 0.75, 0.10, …] 768 numbers
  • “cat” → [0.78, 0.72, 0.12, …] ← similar pattern to “dog”
  • “bank” → [0.08, 0.10, 0.85, …] ← completely different

Gemini’s embedding model handles this conversion. Gemini 的嵌入模型负责处理这种转换。

Vector DB

向量数据库(Vector DB)

A regular DB searches by keyword matching. A vector DB searches by numeric distance — meaning it finds semantically related documents even when the exact words don’t match. 常规数据库通过关键词匹配进行搜索。向量数据库则通过数值距离进行搜索——这意味着即使没有完全匹配的词,它也能找到语义相关的文档。

  • Regular search (misses if keywords don’t match): SELECT * FROM docs WHERE body LIKE '%F1 score%';
  • Vector search (finds semantically related docs): SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;
  • 常规搜索(关键词不匹配则无法命中): SELECT * FROM docs WHERE body LIKE '%F1 score%';
  • 向量搜索(找到语义相关的文档): SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;

Search for “how to measure model performance” and it finds “F1 score calculation” — even without matching words. We use pgvector, a PostgreSQL extension, for this. 搜索“如何衡量模型性能”,它能找到“F1 分数计算”——即使没有匹配的词。我们使用 PostgreSQL 的扩展插件 pgvector 来实现这一点。

RAG

RAG(检索增强生成)

LLMs are limited to their training data. RAG is a design pattern that retrieves relevant documents and passes them to the LLM as context, enabling the model to answer questions about data it has never seen. 大模型受限于其训练数据。RAG 是一种设计模式,它通过检索相关文档并将其作为上下文传递给 LLM,使模型能够回答关于它从未见过的数据的问题。

  • [Plain LLM] question → answers from training data only
  • [RAG] question → search Vector DB → pass results to LLM → grounded answer
  • [普通 LLM] 问题 → 仅从训练数据中回答
  • [RAG] 问题 → 搜索向量数据库 → 将结果传递给 LLM → 基于事实的回答

Who This Is For

目标读者

  • Engineers with Python experience who are new to AI application development
  • Anyone who wants to understand RAG, Embedding, and vector search through code
  • Anyone who wants to learn hands-on from local implementation to cloud deployment
  • 有 Python 经验但刚接触 AI 应用开发的工程师
  • 想要通过代码理解 RAG、Embedding 和向量搜索的开发者
  • 想要从本地实现到云端部署进行实战学习的任何人

Tools Used (All Free)

使用工具(全部免费)

ToolPurposeFree Tier
Google Gemini APIEmbedding generation · answer generation1,500 requests/day
pgvector (PostgreSQL extension)Vector storage · searchUnlimited (local)
DockerRun pgvector locallyUnlimited
Python 3.12Implementation language-
RenderDeploy MCP serverFree web service (with sleep)
Supabase Cloudpgvector500MB persistent free
工具用途免费额度
Google Gemini API嵌入生成 · 回答生成1,500 次请求/天
pgvector (PostgreSQL 扩展)向量存储 · 搜索无限制(本地)
Docker本地运行 pgvector无限制
Python 3.12开发语言-
Render部署 MCP 服务器免费 Web 服务(带休眠)
Supabase Cloudpgvector500MB 永久免费

Where This Fits in the AI Architect Roadmap

在 AI 架构师路线图中的位置

This guide focuses on the Applied and Design phases — the first big implementation step after learning the fundamentals (LLM basics, Prompt Engineering, API/SDK usage). 本指南专注于“应用”与“设计”阶段——这是在掌握基础知识(LLM 基础、提示词工程、API/SDK 使用)之后,迈出的第一个重大实现步骤。

TopicWhat we implement
✓ RAGFull RAG pipeline with pgvector and Gemini
✓ EmbeddingText-to-vector conversion with Gemini Embedding API
✓ Vector DBCosine similarity search with pgvector
主题实现内容
✓ RAG基于 pgvector 和 Gemini 的完整 RAG 流水线
✓ Embedding使用 Gemini Embedding API 进行文本转向量
✓ Vector DB使用 pgvector 进行余弦相似度搜索

Let’s get started in the next article with environment setup and the first implementation. 下一篇文章我们将开始环境配置和首次实现,敬请期待。


Series Index

系列索引

  1. Introduction (this article) —— 引言(本文)
  2. RAG · Embedding · Vector DB Implementation —— RAG · Embedding · 向量数据库实现
  3. Reading RAG Design from an AI Architect’s Perspective —— 从 AI 架构师视角解读 RAG 设计
  4. Tool Use — Letting the LLM Search Autonomously —— 工具调用:让 LLM 自主搜索
  5. AI Agents — Combining Multiple Tools —— AI Agent:组合多种工具
  6. MCP — Exposing pgvector Search as an MCP Server —— MCP:将 pgvector 搜索封装为 MCP 服务器
  7. Cloud Deployment — Render × Supabase —— 云端部署:Render × Supabase
  8. Wrap-up and Next Steps —— 总结与后续步骤

Source code: github.com/qameqame/pgvector-tutorial