Build a Simple RAG App with Telnyx AI Inference
Build a Simple RAG App with Telnyx AI Inference
使用 Telnyx AI Inference 构建简单的 RAG 应用
RAG is one of those patterns that sounds more complicated than it has to be. At its core, retrieval-augmented generation is just: Store some documents, Embed the user’s question, Find the most relevant docs, Send those docs to the model as context, Return an answer with sources. I built a small Python example that shows that flow end to end with Telnyx AI Inference. Repo: https://github.com/team-telnyx/telnyx-code-examples/tree/main/build-rag-with-telnyx-inference-python
RAG(检索增强生成)是那种听起来比实际更复杂的模式。其核心逻辑非常简单:存储一些文档、对用户的问题进行向量化(Embedding)、查找最相关的文档、将这些文档作为上下文发送给模型、最后返回带有来源的答案。我构建了一个小型 Python 示例,展示了如何使用 Telnyx AI Inference 端到端地实现这一流程。代码仓库:https://github.com/team-telnyx/telnyx-code-examples/tree/main/build-rag-with-telnyx-inference-python
What it does
它能做什么
The app exposes a Flask API for asking questions against a tiny in-memory knowledge base. You send a question like: { "question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?" }. The app creates an embedding for the question, compares it against embeddings for the sample documents, retrieves the most relevant sources, sends those sources to a chat model, and returns a grounded answer plus source titles.
该应用提供了一个 Flask API,用于针对一个微型的内存知识库进行提问。你可以发送如下问题:{ "question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?" }。应用会为该问题创建向量,将其与示例文档的向量进行比对,检索出最相关的来源,将这些来源发送给聊天模型,并返回一个基于事实的答案以及来源标题。
Why this pattern is useful
为什么这种模式很有用
A normal LLM call only knows what is in the prompt and the model’s training data. RAG lets your app answer with your own docs, policies, product information, support notes, or internal knowledge base. That makes it useful for things like: support assistants, internal docs search, onboarding copilots, product Q&A, troubleshooting workflows, and agent tools that need source-grounded answers.
普通的 LLM 调用仅了解提示词和模型训练数据中的内容。RAG 则允许你的应用使用你自己的文档、政策、产品信息、支持笔记或内部知识库来回答问题。这使得它在以下场景中非常有用:支持助手、内部文档搜索、入职引导助手、产品问答、故障排除工作流,以及需要基于来源回答的智能体工具。
How the example works
示例的工作原理
The example keeps the moving parts intentionally small. There is an in-memory DOCUMENTS list. On the first request, the app creates embeddings for those documents and caches them. When a user asks a question, the app embeds the question, compares it to the document embeddings, and sends the best matches to the model. The answer response includes source titles, so you can see what context the app used instead of treating the model like a black box.
该示例特意保持了极简的架构。它包含一个内存中的 DOCUMENTS 列表。在首次请求时,应用会为这些文档创建向量并进行缓存。当用户提问时,应用会对问题进行向量化,将其与文档向量进行比对,并将最佳匹配项发送给模型。回答中包含了来源标题,因此你可以清楚地看到应用使用了哪些上下文,而不是将模型视为一个黑盒。
Try it
尝试一下
Clone the repo: git clone https://github.com/team-telnyx/telnyx-code-examples.git
cd telnyx-code-examples/build-rag-with-telnyx-inference-python
克隆仓库:git clone https://github.com/team-telnyx/telnyx-code-examples.git
cd telnyx-code-examples/build-rag-with-telnyx-inference-python
Install dependencies and run the app:
pip install -r requirements.txt
cp .env.example .env
python app.py
安装依赖并运行应用:
pip install -r requirements.txt
cp .env.example .env
python app.py
Ask a question:
curl -X POST http://localhost:5000/rag/ask \ -H "Content-Type: application/json" \ -d '{ "question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?" }'
提问:
curl -X POST http://localhost:5000/rag/ask \ -H "Content-Type: application/json" \ -d '{ "question": "Production signup broke after rotating an API key. Logs show 401 errors. What should we check first?" }'
Why I like this example
为什么我喜欢这个示例
It is deliberately small, but it gives you the core pieces of a real RAG workflow: embeddings, retrieval, source grounding, and chat completion, all behind a clean API surface. From there, you could swap the in-memory docs for a vector database, pull content from product docs, or turn it into a support assistant. The Telnyx code examples repo is also structured to be agent-readable, so coding agents can inspect these examples and help you extend them into fuller applications.
它虽然小巧,但涵盖了真实 RAG 工作流的核心要素:向量化、检索、来源溯源和聊天补全,且都封装在简洁的 API 接口之后。在此基础上,你可以将内存文档替换为向量数据库,从产品文档中提取内容,或将其转化为支持助手。Telnyx 代码示例仓库的结构也便于智能体阅读,因此编码智能体可以检查这些示例,并帮助你将其扩展为更完整的应用程序。