Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval
Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval
嵌入并非魔法:RAG 检索中可预测的失效模式
LLM Applications Embeddings Aren’t Magic: The Predictable Failure Modes of RAG Retrieval Enterprise Document Intelligence [Vol. 1 #2] LLM 应用:嵌入并非魔法——RAG 检索中可预测的失效模式(企业文档智能系列 [第 1 卷 #2])
Why the same vector search that handles synonyms and paraphrase silently fails on negation, exact identifiers, and your company’s acronyms, and what to use when it does. 为什么能够处理同义词和释义的向量搜索,在面对否定句、精确标识符和公司内部缩写时会悄无声息地失效?以及当失效发生时该如何应对。
angela shi | May 30, 2026 | 44 min read angela shi | 2026年5月30日 | 44分钟阅读
Two scenes, both familiar. 两个场景,都很熟悉。
Scene 1: A RAG system over a few hundred pages of policy documents goes live for a small team. The first thing that impresses everyone: it handles paraphrase. Someone asks “how do I cancel?”, the document never uses the word cancel, it uses termination procedures, and the system finds it anyway. Another user asks in French while the policy is in English, and the right page comes back. A typo here, a phonetic spelling there, no problem. After a few days the team is genuinely impressed. The closest thing RAG has to magic is sitting in front of them, and it didn’t take any hand-coded synonym table to make it work. 场景 1:一个涵盖数百页政策文档的 RAG 系统在一个小团队中上线了。最让大家印象深刻的第一点是:它能处理释义。有人问“我该如何取消?”,文档中从未使用过“取消”这个词,而是使用“终止程序”,但系统依然找到了它。另一位用户用法语提问,而政策是英文的,系统依然返回了正确的页面。这里有个错别字,那里有个拼写错误,都没问题。几天后,团队成员感到非常惊艳。RAG 最接近魔法的部分就呈现在他们面前,而且无需任何人工编写的同义词表就能实现。
Scene 2: The same system, two weeks later. The user asks “what’s the rule on contractor overtime?” The system answers “I couldn’t find that information.” The user, who happens to be the business expert who wrote half this manual, frowns, opens the PDF, types non-employee labor into Ctrl-F, and lands on the exact paragraph in three seconds. The right keyword wasn’t overtime. It was the term the document actually uses. The expert knew that; the embedding didn’t. Pretty quickly, more cases like this surface. Negation breaks. Exact contract reference numbers break. An internal product code returns the wrong tier. None of it is fixable by swapping the embedding provider. 场景 2:两周后,同一个系统。用户问“关于承包商加班的规定是什么?”系统回答“我找不到相关信息。”这位用户恰好是编写了这本手册一半内容的业务专家,他皱了皱眉,打开 PDF,在 Ctrl-F 中输入“非雇员劳工”,三秒钟就定位到了准确的段落。正确的关键词不是“加班”,而是文档中实际使用的术语。专家知道这一点,但嵌入模型不知道。很快,更多类似的情况浮出水面。否定句失效了。精确的合同参考编号失效了。内部产品代码返回了错误的层级。这些问题都无法通过更换嵌入模型提供商来解决。
The position of the series, stated up front: most enterprise reliability gains come from strong upstream filtering (expert keywords, document structure), not from a reranker stacked on top of weak retrieval. The classical stack ranks the layers by cost: cheap embedding similarity at the bottom, an optional cross-encoder reranker between, the chat-completion LLM on top. None of them is magic; each breaks in specific ways. This article is one piece of the broader Entreprise Document Intelligence Vol. 1 series, which builds enterprise RAG brick by brick from a baseline pipeline to corpus-scale architecture. 本系列文章的立场开宗明义:大多数企业级可靠性的提升来自于强大的上游过滤(专家关键词、文档结构),而不是在薄弱的检索之上堆叠重排序器(reranker)。经典的架构按成本分层:底层是廉价的嵌入相似度计算,中间是可选的交叉编码器重排序,顶层是聊天补全 LLM。它们都不是魔法;每一种都有其特定的失效方式。本文是“企业文档智能”第 1 卷系列的一部分,该系列旨在从基准流水线到语料库规模架构,一块砖一块砖地构建企业级 RAG。
1. What embeddings nail
1. 嵌入模型的强项
Before the failures, what embeddings actually impress at. The failures only make sense in contrast. An embedding turns a piece of text into a vector. Texts with similar terms end up close in vector space. An embedding is a list of numbers that captures the meaning of a piece of text: a longer list can carry more nuance. Embeddings have improved with each generation. 在讨论失效之前,先看看嵌入模型真正令人印象深刻的地方。只有通过对比,失效才显得有意义。嵌入将一段文本转化为向量。术语相似的文本在向量空间中会靠得很近。嵌入是一串捕捉文本含义的数字列表:列表越长,承载的细微差别就越多。嵌入模型在每一代中都在进步。
(Technical code and model comparison details omitted for brevity) (为简洁起见,省略了技术代码和模型对比细节)
1.1 Conceptual proximity
1.1 概念邻近性
car matches passages about vehicles, automobiles, motor vehicles. fire damage finds passages on smoke damage and scorching. manager approval matches a clause about executive approval. The model captures the semantic field, not just the surface words. This is what makes embeddings feel powerful: the user does not have to guess the document’s vocabulary; the embedding bridges the rest. Casual query bridges to formal pa “汽车”(car)可以匹配关于“车辆”、“机动车”的段落。“火灾损坏”(fire damage)可以找到关于“烟雾损坏”和“烧焦”的段落。“经理批准”(manager approval)可以匹配关于“高管批准”的条款。模型捕捉到的是语义场,而不仅仅是表面的词汇。这就是嵌入模型让人感觉强大的原因:用户不必去猜测文档的词汇表;嵌入模型弥合了其余的差距。随意的查询可以连接到正式的段落。