Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

微观-宏观检索：减少大语言模型中的长文本幻觉

Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. 大语言模型（LLMs）在许多任务中表现出色，但仍然容易产生幻觉，特别是在长文本生成任务中，冗余的检索上下文和冗长的推理链会放大事实性错误。

Recent studies highlight a critical phenomenon: the closer key information appears to the model outputs, the higher the factual accuracy. However, existing retrieval-augmented language models (RALMs) lack effective mechanisms to ensure this proximity - external evidence is injected into reasoning via multi-turn retrieval, but this cannot ensure key information stays close to the outputs. 近期研究强调了一个关键现象：关键信息距离模型输出越近，事实准确性就越高。然而，现有的检索增强语言模型（RALMs）缺乏有效的机制来确保这种邻近性——尽管外部证据通过多轮检索被注入到推理过程中，但这无法保证关键信息始终靠近输出端。

We propose Micro-Macro Retrieval (M2R), a novel retrieve-while-generate framework to fill this gap. At the macro level, M2R retrieves coarse-grained evidence from external sources; at the micro level, it extracts essential results from a key information repository built during reasoning and reuses them while generating answers. 我们提出了微观-宏观检索（Micro-Macro Retrieval, M2R），这是一种新颖的“边生成边检索”框架，旨在填补这一空白。在宏观层面，M2R 从外部来源检索粗粒度证据；在微观层面，它从推理过程中构建的关键信息库中提取核心结果，并在生成答案时复用这些信息。

This design directly addresses the key-information-to-output proximity bottleneck, effectively reducing hallucination in long-form tasks. M2R is trained with a curriculum learning-based reinforcement learning strategy using customized rule-based rewards, enabling stable acquisition of retrieval and grounding skills. 该设计直接解决了“关键信息到输出”的邻近性瓶颈，有效减少了长文本任务中的幻觉。M2R 采用基于课程学习的强化学习策略进行训练，并使用定制的基于规则的奖励，从而实现了检索和归因技能的稳定获取。

Extensive experiments across different benchmarks demonstrate the effectiveness of M2R, especially in lengthy-context settings. 在不同基准测试上的广泛实验证明了 M2R 的有效性，特别是在长上下文场景中。