Stop Returning Text from RAG: The Typed Answer Contract That Prevents Hallucination

停止在 RAG 中返回文本：防止幻觉的类型化答案契约

Enterprise Document Intelligence [Vol.1 #8A] – The schema is the contract: every field is a question the pipeline asks the model, and every answer is checkable 企业文档智能 [第1卷 #8A] —— 模式即契约：每一个字段都是流水线向模型提出的问题，每一个答案都是可核查的

Kezhan Shi | Jul 4, 2026 | 31 min read Kezhan Shi | 2026年7月4日 | 31分钟阅读

This article opens the generation brick of Enterprise Document Intelligence, a series that builds an enterprise RAG system from four bricks: document parsing, question parsing, retrieval, and generation. Generation is the fourth and last brick. This is the first of its three parts: the contract, the typed answer schema the model has to fill. The companions cover how the call that fills it is assembled (Article 8B, prompt assembly) and how the answer is checked and looped back into the pipeline (Article 8C, validation). 本文开启了“企业文档智能”系列的生成模块。该系列通过四个模块构建企业级 RAG 系统：文档解析、问题解析、检索和生成。生成是第四个也是最后一个模块。这是该模块三部分中的第一部分：契约，即模型必须填充的类型化答案模式。后续文章将涵盖如何组装填充该模式的调用（第 8B 篇，提示词组装），以及如何检查答案并将其反馈回流水线（第 8C 篇，验证）。

1. The model hallucinates; answer from the passages, not from memory

1. 模型会产生幻觉；应基于段落回答，而非基于记忆

The generator’s job is to turn those passages and that question into an answer, and the model will hallucinate on the way. That is not a bug to patch. It is what generative AI does: it predicts the most plausible next token, it does not look anything up. On a topic that saturates its training data the prediction is reliable. On your contract, seen once or never, it predicts a continuation just as fluent, just as confident, and far more likely to be wrong. You can’t train that away. You can only shrink the room for it. 生成器的工作是将这些段落和问题转化为答案，而模型在此过程中会产生幻觉。这不是一个需要修复的 Bug，而是生成式 AI 的本质：它预测最合理的下一个 Token，而不是进行查找。在训练数据饱和的主题上，预测是可靠的；但在你的合同（可能只见过一次或从未见过）上，它会预测出同样流畅、同样自信，但极有可能错误的续写。你无法通过训练消除这一点，只能压缩它产生幻觉的空间。

Most of that room is already closed by the time generation runs. Each brick before it hands generation something clean: Document parsing gives it structured tables, not a garbled text dump. Question parsing gives it a precise question and a declared answer format, the shape and type to return, not a loose string. Retrieval gives it the minimum, the few passages that actually hold the answer, each pinned to a clear anchor on its exact lines. Three bricks, three ways the room to invent got smaller. The ground is prepared, and generation only has to not waste it. 当生成步骤运行时，大部分幻觉空间已经被关闭。之前的每个模块都为生成步骤提供了清晰的输入：文档解析提供结构化表格，而非乱码文本；问题解析提供精确的问题和声明的答案格式（返回的形状和类型），而非松散的字符串；检索提供最少量的、真正包含答案的段落，每一段都锚定在精确的行上。三个模块，三种缩小虚构空间的方法。基础已经打好，生成步骤只需不浪费这些准备即可。

Generation is where you spend that preparation, and the lever is not a smarter prompt (“do not make things up” changes nothing). It is controlled execution. The model answers only from the passages in front of it, in a typed shape, with a citation for every claim. Structured input, passages plus question, in. Structured output, a typed schema with citations, fidelity flags, and feedback for the pipeline, out. Ask for “an answer” and the model fills the gaps from memory. Ask for a structured object whose every field is checked against the input, and it has nowhere to invent. 生成步骤是你利用这些准备工作的地方，而杠杆不是更聪明的提示词（“不要编造”毫无作用），而是受控执行。模型仅根据眼前的段落，以类型化的格式进行回答，并为每一项主张提供引用。输入是结构化的（段落加问题），输出是结构化的（带有引用、保真度标记和流水线反馈的类型化模式）。如果你要求“一个答案”，模型会从记忆中填补空白；如果你要求一个结构化对象，且其每个字段都针对输入进行核查，它就无处编造。

2. Asking the model for more than “the answer”

2. 向模型索取“答案”之外的内容

The schema is the contract between the pipeline and the model, and it doesn’t have to stop at “the answer”. The minimal RAG pipeline’s AnswerWithEvidence was the minimum that earns the word “RAG”: a direct answer, the evidence start/end, a confidence, optional quotes and caveats. That works for prose questions. Every field we add past that is another question the schema asks the model, and each earns its place by giving the pipeline something it couldn’t get otherwise. 模式是流水线与模型之间的契约，它不必止步于“答案”。最小化 RAG 流水线的 AnswerWithEvidence 是获得“RAG”称号的最低标准：直接答案、证据起止点、置信度、可选的引用和注意事项。这适用于散文式问题。在此之后我们添加的每一个字段，都是模式向模型提出的另一个问题，每一个字段都通过为流水线提供其无法以其他方式获取的信息来证明其价值。

The rich contract stacks four kinds of fields on top of the minimal one: 这个丰富的契约在最小化字段之上叠加了四类字段：

Typed values per shape (section 2.1): Amount(value, currency, unit) instead of the string “USD 1,200 per claim”; DateValue(iso, original) instead of “15 March 2024”; TableValue(headers, rows) instead of pipe-separated text. Downstream code never re-parses a string. 按形状划分的类型化值（2.1 节）： 使用 Amount(value, currency, unit) 代替字符串 “USD 1,200 per claim”；使用 DateValue(iso, original) 代替 “15 March 2024”；使用 TableValue(headers, rows) 代替管道符分隔的文本。下游代码无需再解析字符串。
Multi-element answers with multi-span citations (section 2.2): many real questions have a list as the answer; many single-element answers have non-contiguous evidence (a definition on page 5 plus an example on page 23). The schema models both directly. 带有跨度引用引用的多元素答案（2.2 节）： 许多实际问题的答案是一个列表；许多单元素答案具有非连续的证据（第 5 页的定义加上第 23 页的示例）。该模式直接对两者进行建模。
Self-assessment + pipeline-feedback fields (section 2.3): confidence, caveats, answer_found, complete_answer_found, context_structured, llm_discovered_keywords, conflicting_evidence, suggested_clarification. Each of these makes the model emit a signal the pipeline reads to decide its next move. 自我评估 + 流水线反馈字段（2.3 节）： 置信度、注意事项、是否找到答案、是否找到完整答案、上下文结构化、LLM 发现的关键词、冲突证据、建议的澄清。每一个字段都让模型发出一个信号，流水线读取该信号以决定下一步行动。
Programmatic completeness (section 2.4): the one signal we deliberately do not ask the model. It’s set by the pipeline based on what retrieval was parametrized to include. Strong because deterministic, grounded in document structure, not in the model’s self-rating. 程序化完整性（2.4 节）： 这是我们特意不要求模型提供的唯一信号。它由流水线根据检索参数包含的内容来设置。它之所以强大是因为它是确定性的，基于文档结构，而非模型的自我评分。