Forecasting Future Behavior as a Learning Task

将预测未来行为作为一项学习任务

Abstract: Trust in an AI system is often anchored by explanations of how it works, which one then uses to forecast its behavior on new inputs. For large reasoning models (LRMs), this conventional route is particularly difficult to follow: explanation methods for single token generations do not naturally generalize to long trajectories, and the trajectories themselves are often not faithful when read as natural language.

摘要： 对人工智能系统的信任通常建立在对其工作原理的解释之上，人们据此预测其在处理新输入时的行为。对于大型推理模型（LRM）而言，这种传统路径尤其难以遵循：针对单个 Token 生成的解释方法无法自然地推广到长轨迹，且当这些轨迹被解读为自然语言时，其本身往往缺乏忠实度。

We propose an alternative that bypasses the explanation step: treat behavior forecasting as a learnable task and train Behavior Forecasters that operates on a single reasoning trajectory to make the same forecasts one would typically seek from an explanation. The forecaster’s training data is obtained by querying the LRM with no human annotation, and its inference is done in a single forward pass.

我们提出了一种绕过解释步骤的替代方案：将行为预测视为一项可学习的任务，并训练“行为预测器”（Behavior Forecasters）。该预测器基于单一推理轨迹进行操作，能够做出通常需要通过解释才能获得的预测。预测器的训练数据通过查询 LRM 获得，无需人工标注，且其推理过程仅需一次前向传播即可完成。

We instantiate this approach on two tasks: how likely the LRM is to repeat its answer on re-runs, and how removing parts of the input changes its answer. We evaluate this approach on both tasks across three diverse reasoning datasets and find that trained Behavior Forecasters are more accurate than GPT-5.4 and Claude Opus-4.6 reading the same trajectories as naive readers, at a small fraction of their inference cost.

我们将此方法应用于两项任务：LRM 在重新运行时重复其答案的可能性，以及移除部分输入如何改变其答案。我们在三个不同的推理数据集上对这两项任务进行了评估，结果发现，训练后的“行为预测器”比作为普通阅读者读取相同轨迹的 GPT-5.4 和 Claude Opus-4.6 更准确，且推理成本仅为后者的一小部分。

We find that fine-tuning the backbone end-to-end and initializing it from the target LRM are each necessary for strong performance. These results show that the reasoning trajectory carries information about the LRM’s future behavior that goes beyond what naive reading conveys.

我们发现，对骨干模型进行端到端的微调，并从目标 LRM 进行初始化，对于实现高性能而言都是必不可少的。这些结果表明，推理轨迹中携带了关于 LRM 未来行为的信息，这些信息超出了普通阅读所能传达的范畴。