Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories
Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories
从推理中探寻真理:用于引导大语言模型轨迹的动态表征编辑框架
Abstract: Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and “Wait” prompts, primarily encourage models to think more, yet often fail to guide them toward Truth. While Representation Editing (RepE) offers an intrinsic control, its application to dynamic reasoning trajectories remains underexplored. 摘要: 目前增强大语言模型(LLM)推理的方法,如思维链(Chain-of-Thought)和“等待”(Wait)提示词,主要鼓励模型进行更多的思考,但往往无法引导它们走向“真理”。虽然表征编辑(RepE)提供了一种内在的控制手段,但其在动态推理轨迹上的应用仍未得到充分探索。
In this work, we bridge this gap by investigating the geometry of truth within unfolding reasoning chains. We uncover three critical insights: (1) Truth is encoded at the sentence level and is entangled with latent reasoning patterns; (2) Effective intervention follows an Uncertainty Principle and a Decay Effect, requiring localization to early, high-entropy forks; (3) Naive steering vectors suffer from noise, risking collateral damage to correct trajectories. 在这项工作中,我们通过研究推理链展开过程中的真理几何结构来弥补这一空白。我们发现了三个关键见解:(1)真理是在句子层面编码的,并与潜在的推理模式相互交织;(2)有效的干预遵循“不确定性原理”和“衰减效应”,需要定位到早期的、高熵的分叉点;(3)简单的引导向量(steering vectors)会受到噪声干扰,从而冒着对正确轨迹造成附带损害的风险。
Based on these findings, we propose DynaSteer, a dynamic RepE framework. DynaSteer employs pattern clustering to disentangle reasoning manifolds and utilizes Fisher-LDA to project purified truth. By dynamically monitoring lookahead entropy, it selectively steers and rolls back trajectories only when necessary. 基于这些发现,我们提出了 DynaSteer,这是一个动态的 RepE 框架。DynaSteer 采用模式聚类来解耦推理流形,并利用 Fisher-LDA 来投射提纯后的真理。通过动态监控前瞻熵(lookahead entropy),它能够有选择地进行引导,并仅在必要时回滚轨迹。
Comprehensive experimental results on several MATH benchmarks verify the effectiveness of DynaSteer, and experiments on out-of-domain coding tasks further confirm its generalization ability. Our code is publicly available at this https URL. 在多个 MATH 基准测试上的综合实验结果验证了 DynaSteer 的有效性,而在跨领域编程任务上的实验进一步证实了其泛化能力。我们的代码已在以下网址公开。