ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

ReaComp:将大模型推理编译为符号求解器以实现高效程序合成

Abstract: LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs.

摘要: 大语言模型(LLM)能够解决程序合成任务,但在需要大规模组合搜索的复杂实例上,它们仍然效率低下且不可靠。通过利用少量推理轨迹,我们使用编码智能体将其编译为基于受限领域特定语言(DSL)的可重用符号程序合成器。

The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling for the latter by +16.3 percentage points at zero LLM inference cost.

由此产生的求解器在测试阶段无需调用大模型,且是强大的独立系统:符号求解器集成在 PBEBench-Lite 上达到了 91.3% 的准确率,在 PBEBench-Hard 上达到了 84.7% 的准确率。在后者任务中,该方法在零大模型推理成本的情况下,比采用测试时扩展(test-time scaling)的大模型高出 16.3 个百分点。

They also complement LLM search, improving PBEBench-Hard accuracy from 68.4% to 85.8% while reducing reported token usage by 78%, and raising SLR-Bench hard-tier accuracy from 34.4% to 58.0% in a neuro-symbolic hybrid setting.

它们还能与大模型搜索形成互补,将 PBEBench-Hard 的准确率从 68.4% 提升至 85.8%,同时将报告的 Token 使用量减少了 78%;在神经符号混合设置下,将 SLR-Bench 困难层级的准确率从 34.4% 提升至 58.0%。

Compared to directly using coding agents as per-instance solvers, induced solvers are substantially more Pareto-efficient, amortizing a small one-time construction cost over many zero-token executions.

与直接将编码智能体作为单实例求解器相比,诱导出的求解器在帕累托效率(Pareto-efficient)上表现显著更优,通过多次零 Token 执行摊销了较小的一次性构建成本。

Finally, most solvers transfer zero-shot to a real historical linguistics task - predicting sound changes in natural language data - reaching 80.1% accuracy under ensembling and recovering some plausible linguistic rules.

最后,大多数求解器能够零样本迁移到真实的语言历史学任务中——即预测自然语言数据中的语音演变——在集成模式下达到了 80.1% 的准确率,并恢复了一些合理的语言学规则。

Together, these results show that reasoning traces can be compiled into reusable symbolic solvers that solve many tasks directly, complement LLM inference on hard cases, and provide a scalable route to domain-general solver induction. We release code and data for reproducibility.

总之,这些结果表明,推理轨迹可以被编译为可重用的符号求解器,从而直接解决许多任务,在处理复杂案例时补充大模型推理,并为领域通用的求解器归纳提供了一条可扩展的路径。我们已发布代码和数据以供复现。