Amplify the Expert: A Philosophy for Building Enterprise RAG
Amplify the Expert: A Philosophy for Building Enterprise RAG
放大专家能力:构建企业级 RAG 的哲学
Large Language Model Amplify the Expert: A Philosophy for Building Enterprise RAG Enterprise Document Intelligence [Vol.1 #M1] – The thesis behind every architectural choice in this series. 大型语言模型“放大专家能力”:构建企业级 RAG 的哲学。企业文档智能 [第1卷 #M1] —— 本系列中每一个架构选择背后的核心论点。
This article is a manifesto of Enterprise Document Intelligence, a series that builds an enterprise RAG system from four bricks: parsing, question parsing, retrieval, and generation. Amplify the expert: the thesis behind every architectural choice in the series. 本文是《企业文档智能》系列的宣言,该系列旨在通过四个基石构建企业级 RAG 系统:解析、问题解析、检索和生成。放大专家能力:这是本系列每一个架构选择背后的核心论点。
If you have to remember one idea from this series, it is this: enterprise RAG amplifies the expert. It does not replace them. This piece sets down the thesis up front, before the techniques start, because every later article derives from it. Most architectural mistakes in production RAG follow from forgetting this. Once you accept it, the rest of the series stops being a catalog of techniques and starts looking like a coherent argument. 如果你必须记住本系列中的一个核心思想,那就是:企业级 RAG 是为了放大专家的能力,而不是取代他们。在介绍具体技术之前,本文先确立这一论点,因为后续的每一篇文章都源于此。生产环境 RAG 中大多数架构错误都源于遗忘了这一点。一旦你接受了这个观点,本系列的其余部分就不再仅仅是技术目录,而是一个连贯的论证。
1. The thesis in one sentence
1. 一句话概括核心论点
This series is about building RAG systems that amplify enterprise experts working with their own documents, not about building general-purpose document intelligence that replaces them. The premise sounds modest but it changes most architectural choices. 本系列旨在构建能够放大企业专家处理自身文档能力 RAG 系统,而不是构建旨在取代他们的通用文档智能。这个前提听起来很谦逊,但它改变了大多数架构选择。
The system’s job is to scale judgment that already exists in human form: the lawyer who has read a thousand contracts, the underwriter who reaches for the deductible clause on reflex, the compliance officer who knows which sentence the auditor will ask about. Those people are the source of truth. The system handles volume, finds passages in seconds, compares documents systematically. It does not pretend to be the expert. 系统的任务是扩展人类已经具备的判断力:读过上千份合同的律师、凭直觉就能找到免赔条款的承保人、知道审计员会询问哪一句话的合规官。这些人才是真理的源头。系统负责处理海量数据、在几秒钟内找到段落、系统性地对比文档。它并不假装自己是专家。
Every other position the series defends derives from this thesis. Vector stores are a fallback because the expert already knows the keywords. Deterministic dispatchers beat autonomous agents because the expert needs to audit what happened. Expert dictionaries beat fine-tuned embeddings because the expert’s vocabulary is richer than any IDF formula or vector space could capture. 本系列所捍卫的其他所有立场都源于这一论点。向量数据库只是备选方案,因为专家已经掌握了关键词。确定性的调度器优于自主智能体,因为专家需要审计系统的操作过程。专家词典优于微调后的嵌入(embeddings),因为专家的词汇量比任何 IDF 公式或向量空间所能捕捉到的都要丰富。
2. The gap between two camps
2. 两个阵营之间的鸿沟
Most enterprises run two parallel realities on the same documents: an opaque vector-store pipeline the IT camp built, and an expert who still searches with Ctrl+F because nothing the IT camp shipped earned their trust. The series sits in the bridge between the two. 大多数企业在处理同一批文档时,存在两种平行的现实:一个是 IT 阵营构建的不透明向量数据库流水线,另一个是仍然习惯使用 Ctrl+F 搜索的专家,因为 IT 阵营交付的任何东西都无法赢得他们的信任。本系列旨在架起这两者之间的桥梁。
On the IT side, the camp told by vendors and conference talks to chunk every document, push it into a vector store, embed every query, and trust that cosine similarity will find the right passage. They build the system, they run it, and if you ask them precisely why a given chunk came back, very few can answer. The architecture is opaque even to the people who deployed it. 在 IT 阵营这边,供应商和会议演讲告诉他们:将每份文档分块、推入向量数据库、对每个查询进行嵌入,并相信余弦相似度能找到正确的段落。他们构建并运行系统,但如果你问他们为什么返回了某个特定的分块,很少有人能回答。这种架构即使对部署它的人来说也是不透明的。
On the expert side, decades of accumulated reading. Lawyers who have read a thousand contracts. Underwriters who have priced ten thousand policies. Compliance officers who can name the clause an auditor will ask about before the auditor walks in. Ask them how they search a document. The honest answer is almost always the same. They open the PDF, hit Ctrl+F, type a keyword they know works in their corpus, find the passage. If the keyword misses, they go to the table of contents, locate the right section, scan it line by line. That is the retrieval method that decades of expertise has converged on. 在专家阵营这边,是几十年的阅读积累。读过上千份合同的律师、定价过上万份保单的承保人、在审计员进门前就能说出对方会询问哪个条款的合规官。问问他们如何搜索文档,诚实的回答几乎总是相同的:打开 PDF,按下 Ctrl+F,输入他们知道在语料库中有效的关键词,找到段落。如果关键词没找到,他们会查看目录,定位到正确的章节,逐行扫描。这就是几十年的专业经验所沉淀出的检索方法。
The gap is not benign. The IT-camp system is opaque even to the people who built it; the expert-camp method is precise but does not scale. The series’s natural move is to bring them together: take the method the expert already trusts (keyword search anchored on real vocabulary, then TOC navigation when keywords miss) and use the LLM to scale it. 这种鸿沟并非无害。IT 阵营的系统即使对构建者来说也是不透明的;专家阵营的方法精确但无法扩展。本系列的做法很自然:将两者结合起来——采用专家已经信任的方法(基于真实词汇的关键词搜索,关键词失效时辅以目录导航),并利用大语言模型(LLM)来扩展它。
LLMs are now strong enough that the retrieval stage no longer has to be clever to compensate. The 2022-era reflex of stacking embedding tricks on top of a weak generation model was solving a problem that no longer exists at the same intensity. Retrieval can stay close to the expert’s natural workflow without losing answer quality. 现在的 LLM 已经足够强大,检索阶段不再需要通过“耍小聪明”来弥补模型能力的不足。2022 年那种在弱生成模型之上堆叠嵌入技巧的反射性做法,解决的是一个强度已大不如前的问题。检索过程可以保持贴近专家自然的工作流,同时不损失回答质量。
Underneath the two camps sits a distinction worth stating plainly. There are two ways to answer a question, and they are not the same operation: 在这两个阵营之下,有一个值得明确区分的概念。回答问题有两种方式,它们并非同一个操作:
-
From the model’s parametric memory. You write the question, the model answers, one step. That is a chatbot, and for general knowledge it is enough.
-
来自模型的参数化记忆。 你写下问题,模型回答,一步到位。这就是聊天机器人,对于通用知识来说这已经足够了。
-
From a document. Two phases that have to stay apart. First the passage is found, by keyword the way the expert reaches for Ctrl+F, not by handing the model the raw question. Only then is the question answered, against the document rather than against the model’s training.
-
来自文档。 这需要两个必须分开的阶段。首先通过关键词找到段落(就像专家使用 Ctrl+F 那样),而不是直接把原始问题丢给模型。只有在此之后,才基于文档(而不是基于模型的训练数据)来回答问题。
Enterprise work is the second case, and the rest of the series keeps the two phases apart. Mirroring the expert’s method this closely is not cosmetic. The point is not that vector stores are wrong everywhere; the point is that adopting a method the expert cannot recognize, on documents the expert knows by heart, is the fastest way to lose their trust. Without trust, the system does not get used, and a system that is not used has zero value regardless of how impressive its benchmarks look. 企业工作属于第二种情况,本系列的后续内容将始终保持这两个阶段的分离。如此紧密地模仿专家的方法并非仅仅为了美观。重点不在于向量数据库在任何地方都是错的,而在于:在专家烂熟于心的文档上,采用一种专家无法理解的方法,是失去他们信任的最快途径。没有信任,系统就不会被使用;而一个不被使用的系统,无论其基准测试看起来多么令人印象深刻,其价值都为零。
3. The historical parallel: machine learning ten years ago
3. 历史的相似性:十年前的机器学习
RAG is repeating the enterprise ML wave of 2015 to 2020 verbatim. The same vendor-copying reflex, the same generic templates, the same failure modes. What worked then, and what will work for RAG now, is domain-specific work anchored on existing expertise. RAG 正在逐字重复 2015 年到 2020 年的企业机器学习浪潮。同样的供应商模仿反射、同样的通用模板、同样的失败模式。当时行之有效的方法,以及现在对 RAG 有效的方法,都是基于现有专业知识的领域特定工作。
Between 2015 and 2020, enterprises tried to build ML systems by copying Google, DeepMind, and Facebook. “Build a model that learns” was the slogan. Most enterprise ML projects from that era failed to reach production. Gartner put the figure at around 85% in 2019, and the practitioners who lived through the wave cite numbers in the same range. They failed for… 在 2015 年到 2020 年间,企业试图通过模仿 Google、DeepMind 和 Facebook 来构建机器学习系统。“构建一个能学习的模型”是当时的口号。那个时代大多数企业机器学习项目都未能进入生产环境。Gartner 在 2019 年给出的失败率约为 85%,经历过那波浪潮的从业者也给出了类似的数字。它们失败的原因是……