FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

FormalScience：基于 Lean 智能体代码生成的科学领域人机协同自动形式化

Abstract: Formalising informal mathematical reasoning into formally verifiable code is a significant challenge for large language models. In scientific fields such as physics, domain-specific machinery (\textit{e.g.} Dirac notation, vector calculus) imposes additional formalisation challenges that modern LLMs and agentic approaches have yet to tackle.

摘要： 将非形式化的数学推理转化为可形式化验证的代码，对于大语言模型（LLM）而言是一项重大挑战。在物理学等科学领域，特定领域的机制（例如狄拉克符号、向量微积分）带来了额外的形式化难题，而现代大语言模型和智能体方法尚未能有效解决这些问题。

To aid autoformalisation in scientific domains, we present FormalScience; a domain-agnostic human-in-the-loop agentic pipeline that enables a single domain expert (without deep formal language experience) to produce \textit{syntactically correct} and \textit{semantically aligned} formal proofs of informal reasoning for low economic cost.

为了辅助科学领域的自动形式化，我们提出了 FormalScience；这是一个与领域无关的人机协同智能体流水线，它使单一领域专家（无需深厚的形式语言经验）能够以低经济成本，生成“语法正确”且“语义对齐”的非形式化推理形式证明。

Applying FormalScience to physics, we construct FormalPhysics, a dataset of 200 university-level (LaTeX) physics problems and solutions (primarily quantum mechanics and electromagnetism), along with their Lean4 formal representations. Compared to existing formal math benchmarks, FormalPhysics achieves perfect formal validity and exhibits greater statement complexity.

我们将 FormalScience 应用于物理学，构建了 FormalPhysics 数据集。该数据集包含 200 道大学水平的（LaTeX 格式）物理问题及其解答（主要涵盖量子力学和电磁学），以及它们对应的 Lean4 形式化表示。与现有的形式数学基准相比，FormalPhysics 实现了完美的形式有效性，并展现出更高的陈述复杂度。

We evaluate open-source models and proprietary systems on a statement autoformalisation task on our dataset via zero-shot prompting, self-refinement with error feedback, and a novel multi-stage agentic approach, and explore autoformalisation limitations in modern LLM-based approaches.

我们通过零样本提示（zero-shot prompting）、基于错误反馈的自我修正以及一种新颖的多阶段智能体方法，在我们的数据集上对开源模型和闭源系统进行了陈述自动形式化任务的评估，并探讨了现代基于大语言模型的方法在自动形式化方面的局限性。

We provide the first systematic characterisation of semantic drift in physics autoformalisation in terms of concepts such as notational collapse and abstraction elevation which reveals what formal language verifies when full semantic preservation is unattainable.

我们首次从符号坍缩（notational collapse）和抽象提升（abstraction elevation）等概念出发，对物理学自动形式化中的语义漂移进行了系统性表征，揭示了当无法实现完全语义保留时，形式语言究竟验证了什么。

We release the codebase together with an interactive UI-based FormalScience system which facilitates autoformalisation and theorem proving in scientific domains beyond this.

我们发布了代码库以及一个基于交互式 UI 的 FormalScience 系统，旨在促进科学领域内更广泛的自动形式化和定理证明工作。

Project Link: https://github.com/jmeadows17/formal-science 项目链接： https://github.com/jmeadows17/formal-science