Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs

形式化潜在思维：大语言模型中思维表征的四个公理

Abstract: We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks. 摘要： 我们为大语言模型（LLM）中的潜在思维表征引入了一个公理化评估框架。该框架包含的指标独立于下游基准测试分数，能够揭示那些被基准测试准确率所掩盖的表征缺陷。

Existing evaluations conflate representation quality with model capacity. Therefore, failures cannot be attributed to the representation rather than to the model that processes it. 现有的评估方法将表征质量与模型能力混为一谈。因此，当模型表现不佳时，我们无法确定这是表征本身的问题，还是处理该表征的模型的问题。

We formalize four functional axioms (Causality, Minimality, Separability, and Stability) and define a quantitative measure for each, computed directly on the representation independently of downstream accuracy. 我们形式化了四个功能性公理（因果性、最小性、可分性和稳定性），并为每个公理定义了量化指标。这些指标直接基于表征进行计算，与下游任务的准确率无关。

We audit open-weight LLMs across 23 reasoning tasks (e.g., Spatial Reasoning, Factual QA). We find that no candidate satisfies all four axioms simultaneously, that the representations distinguish task type reliably but cannot distinguish between two questions within the same task, and that the representations encode little information beyond what is already present in the input embedding. 我们对 23 项推理任务（如空间推理、事实问答）中的开源权重 LLM 进行了审计。研究发现，没有任何候选模型能同时满足这四个公理；虽然这些表征能可靠地识别任务类型，但无法区分同一任务内的两个不同问题；此外，这些表征所编码的信息几乎没有超出输入嵌入（input embedding）中已有的内容。

The failure is consistent across dense, reasoning-distilled, and RL-trained model families, indicating that the gap is structural rather than a property of model size or training procedure. 这种缺陷在稠密模型、推理蒸馏模型以及强化学习训练的模型系列中均一致存在，这表明该差距是结构性的，而非模型规模或训练流程所导致的特性。