ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

ReaComp：将大模型推理编译为符号求解器以实现高效程序合成

Abstract: LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs.

摘要： 大语言模型（LLM）能够解决程序合成任务，但在需要大规模组合搜索的复杂实例上，它们仍然效率低下且不可靠。通过利用少量推理轨迹，我们使用编码智能体将其编译为基于受限领域特定语言（DSL）的可重用符号程序合成器。

The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling for the latter by +16.3 percentage points at zero LLM inference cost.

由此产生的求解器在测试阶段无需调用大模型，且是强大的独立系统：符号求解器集成在 PBEBench-Lite 上达到了 91.3% 的准确率，在 PBEBench-Hard 上达到了 84.7% 的准确率。在后者任务中，该方法在零大模型推理成本的情况下，比采用测试时扩展（test-time scaling）的大模型高出 16.3 个百分点。

They also complement LLM search, improving PBEBench-Hard accuracy from 68.4% to 85.8% while reducing reported token usage by 78%, and raising SLR-Bench hard-tier accuracy from 34.4% to 58.0% in a neuro-symbolic hybrid setting.

它们还能与大模型搜索形成互补，将 PBEBench-Hard 的准确率从 68.4% 提升至 85.8%，同时将报告的 Token 使用量减少了 78%；在神经符号混合设置下，将 SLR-Bench 困难层级的准确率从 34.4% 提升至 58.0%。

Compared to directly using coding agents as per-instance solvers, induced solvers are substantially more Pareto-efficient, amortizing a small one-time construction cost over many zero-token executions.

与直接将编码智能体作为单实例求解器相比，诱导出的求解器在帕累托效率（Pareto-efficient）上表现显著更优，通过多次零 Token 执行摊销了较小的一次性构建成本。

Finally, most solvers transfer zero-shot to a real historical linguistics task - predicting sound changes in natural language data - reaching 80.1% accuracy under ensembling and recovering some plausible linguistic rules.

最后，大多数求解器能够零样本迁移到真实的语言历史学任务中——即预测自然语言数据中的语音演变——在集成模式下达到了 80.1% 的准确率，并恢复了一些合理的语言学规则。

Together, these results show that reasoning traces can be compiled into reusable symbolic solvers that solve many tasks directly, complement LLM inference on hard cases, and provide a scalable route to domain-general solver induction. We release code and data for reproducibility.

总之，这些结果表明，推理轨迹可以被编译为可重用的符号求解器，从而直接解决许多任务，在处理复杂案例时补充大模型推理，并为领域通用的求解器归纳提供了一条可扩展的路径。我们已发布代码和数据以供复现。