Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
微观-宏观检索:减少大语言模型中的长文本幻觉
Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. 大语言模型(LLMs)在许多任务中表现出色,但仍然容易产生幻觉,特别是在长文本生成任务中,冗余的检索上下文和冗长的推理链会放大事实性错误。
Recent studies highlight a critical phenomenon: the closer key information appears to the model outputs, the higher the factual accuracy. However, existing retrieval-augmented language models (RALMs) lack effective mechanisms to ensure this proximity - external evidence is injected into reasoning via multi-turn retrieval, but this cannot ensure key information stays close to the outputs. 近期研究强调了一个关键现象:关键信息距离模型输出越近,事实准确性就越高。然而,现有的检索增强语言模型(RALMs)缺乏有效的机制来确保这种邻近性——尽管外部证据通过多轮检索被注入到推理过程中,但这无法保证关键信息始终靠近输出端。
We propose Micro-Macro Retrieval (M2R), a novel retrieve-while-generate framework to fill this gap. At the macro level, M2R retrieves coarse-grained evidence from external sources; at the micro level, it extracts essential results from a key information repository built during reasoning and reuses them while generating answers. 我们提出了微观-宏观检索(Micro-Macro Retrieval, M2R),这是一种新颖的“边生成边检索”框架,旨在填补这一空白。在宏观层面,M2R 从外部来源检索粗粒度证据;在微观层面,它从推理过程中构建的关键信息库中提取核心结果,并在生成答案时复用这些信息。
This design directly addresses the key-information-to-output proximity bottleneck, effectively reducing hallucination in long-form tasks. M2R is trained with a curriculum learning-based reinforcement learning strategy using customized rule-based rewards, enabling stable acquisition of retrieval and grounding skills. 该设计直接解决了“关键信息到输出”的邻近性瓶颈,有效减少了长文本任务中的幻觉。M2R 采用基于课程学习的强化学习策略进行训练,并使用定制的基于规则的奖励,从而实现了检索和归因技能的稳定获取。
Extensive experiments across different benchmarks demonstrate the effectiveness of M2R, especially in lengthy-context settings. 在不同基准测试上的广泛实验证明了 M2R 的有效性,特别是在长上下文场景中。