Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

Position: Don’t Just “Fix it in Post”: A Science of AI Must Study Training Dynamics

立场:别只会在“后期修复”:AI 科学必须研究训练动力学

Abstract: What would it mean to have a scientific understanding of AI? Models are not static objects: they are snapshots of time-evolving processes shaped by data, objectives, architectures, and optimization dynamics. Yet much of AI research treats models as fixed artifacts, analyzing behaviors after training rather than asking why they emerge.

摘要: 拥有对人工智能的科学理解意味着什么?模型并非静态对象:它们是由数据、目标、架构和优化动力学所塑造的、随时间演变过程的快照。然而,目前大部分 AI 研究将模型视为固定的制品,仅在训练完成后分析其行为,而非探究这些行为为何产生。

This position paper argues that a science of AI must move beyond post-hoc fixes and study the training dynamics that produce model behavior. Such a science should support progressively stronger forms of understanding: predicting outcomes from early training signals, intervening when trajectories go wrong, and ultimately designing training procedures that more reliably produce desired properties.

本立场论文认为,AI 科学必须超越事后修复(post-hoc fixes),转而研究产生模型行为的训练动力学。这样的一门科学应支持更深层次的理解:从早期训练信号预测结果、在训练轨迹偏离时进行干预,并最终设计出能更可靠地产生预期特性的训练流程。

Scaling laws have made prediction routine for loss; the challenge is extending this success to capabilities, biases, robustness, and safety-relevant behaviors. We articulate requirements for such theories grounded in the history and philosophy of science, examine progress in mechanistic interpretability, fairness, memorization, and simplicity bias, and identify concrete open problems.

缩放定律(Scaling laws)已使损失函数的预测成为常规;目前的挑战在于将这一成功扩展到能力、偏见、鲁棒性和安全相关行为的预测上。我们基于科学史和科学哲学阐述了此类理论的需求,审视了机械可解释性、公平性、记忆效应和简单性偏见方面的进展,并指出了具体的开放性问题。