Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models
Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models
可观测模式并非解释:对潜在推理模型的因果几何分析
Latent reasoning models (LRMs) replace explicit chain-of-thought with continuous thoughts. Recent work treats observable latent-state patterns, such as BFS-like frontiers and decodable arithmetic computation, as evidence for internal reasoning mechanisms. 潜在推理模型(LRMs)用连续的思维过程取代了显式的思维链(Chain-of-Thought)。近期的研究将可观测的潜在状态模式(例如类似广度优先搜索的边界和可解码的算术计算)视为内部推理机制的证据。
Evaluating two LRMs (Coconut and CODI) against controls lacking the proposed recurrence or curriculum, we find these patterns also appear in the controls and do not always causally affect behavior. 通过将两个 LRM(Coconut 和 CODI)与缺乏所提循环或课程学习的对照组进行对比评估,我们发现这些模式同样出现在对照组中,且并不总是对模型行为产生因果影响。
Causal interventions reveal that latent-thought utilization is not binary but graded, scaling with a thought’s causal effect on model behavior. Geometric analyses reveal this effect concentrates in low-rank directions whose step-to-step geometry grows more structured as their behavioral influence increases. 因果干预研究表明,潜在思维的利用并非二元对立的,而是分级的,其程度随思维对模型行为的因果影响而变化。几何分析揭示,这种影响集中在低秩方向上,随着其行为影响力的增加,这些方向在步骤间的几何结构会变得更加有序。
Latent thoughts should therefore be treated as hidden computation, not hidden explanation: decodability, attention, or static structure alone cannot establish mechanism. LRM interpretability thus requires matched controls and causal tests. 因此,潜在思维应被视为隐藏的计算,而非隐藏的解释:仅凭可解码性、注意力机制或静态结构无法确立其背后的机制。因此,LRM 的可解释性研究需要匹配的对照组和因果测试。