TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling

TriVAL：用于高保真自动优化建模的三重验证框架

Optimization modeling serves as the pivotal bridge between natural-language problem descriptions and optimization solvers, and remains a cornerstone for bringing operations research (OR) into real-world decision making. 优化建模是连接自然语言问题描述与优化求解器的关键桥梁，也是将运筹学（OR）应用于现实世界决策的基石。

Recent advances in large language models (LLMs) have driven significant progress in automatic optimization modeling. However, existing methods still lack explicit validation during the modeling process, allowing errors introduced in earlier stages to carry through the pipeline and ultimately reduce final modeling accuracy. 大型语言模型（LLM）的最新进展推动了自动优化建模的显著进步。然而，现有方法在建模过程中仍缺乏明确的验证机制，导致早期阶段引入的错误会贯穿整个流程，最终降低最终的建模准确性。

To address this challenge, we introduce TriVAL, a tri-validation framework that performs explicit validation at three stages of automatic optimization modeling: semantic specification, mathematical formulation, and code generation. 为了应对这一挑战，我们引入了 TriVAL，这是一个在自动优化建模的三个阶段执行明确验证的三重验证框架：语义规范、数学公式化和代码生成。

At each stage, TriVAL follows a construct-validate-revise loop that assesses the current result against stage-specific criteria and revises it when needed. This design helps identify and correct errors before they accumulate across stages, helping preserve faithfulness throughout the modeling process. 在每个阶段，TriVAL 都遵循“构建-验证-修订”循环，根据特定阶段的标准评估当前结果，并在必要时进行修订。这种设计有助于在错误跨阶段累积之前识别并纠正它们，从而有助于在整个建模过程中保持高保真度。

To evaluate automatic optimization modeling on more challenging combinatorial problems, we further introduce NL4COP, a benchmark of 150 instances across 50 diverse problem types with more complex decision logic, more tightly coupled constraints, and more demanding modeling requirements than existing benchmarks. 为了评估自动优化建模在更具挑战性的组合优化问题上的表现，我们进一步引入了 NL4COP。这是一个包含 50 种不同问题类型、共 150 个实例的基准测试集，与现有基准相比，它具有更复杂的决策逻辑、更紧密的约束耦合以及更高的建模要求。

Experiments on NL4COP and established benchmarks show that TriVAL consistently outperforms state-of-the-art methods, with the largest gains on the most challenging problems. 在 NL4COP 和现有基准测试上的实验表明，TriVAL 的表现始终优于当前最先进的方法，并且在最具挑战性的问题上提升最为显著。