Constraint acquisition needs better benchmarks

约束获取领域亟需更好的基准测试

Constraint Acquisition (CA) and related research on the validation and enhancement of Mathematical Programming (MP) models from domain knowledge artifacts are currently limited by inadequate benchmarks. This deficiency impedes reproducibility and cross-study comparability, slowing the maturation of CA methods.

约束获取（Constraint Acquisition, CA）以及从领域知识工件中验证和增强数学规划（Mathematical Programming, MP）模型的相关研究，目前正受到基准测试不足的限制。这一缺陷阻碍了研究的可重复性和跨研究的可比性，从而减缓了 CA 方法的成熟进程。

Existing benchmarks were designed for solver evaluation rather than for assessing CA algorithms. They are loosely organized, treat individual problems inconsistently, and omit the domain knowledge artifacts required by CA methods.

现有的基准测试主要是为评估求解器（solver）而设计的，而非用于评估 CA 算法。这些基准测试组织松散，对单个问题的处理方式不一致，且缺失了 CA 方法所必需的领域知识工件。

This work presents MPMMine, a benchmark suite designed to assess algorithms that discover, validate, and enhance MP models using diverse domain knowledge artifacts. MPMMine is guided by consistency, standardization, completeness, extensibility, openness, and version control.

本研究提出了 MPMMine，这是一个旨在评估算法的基准测试套件，用于利用多样化的领域知识工件来发现、验证和增强 MP 模型。MPMMine 的设计遵循一致性、标准化、完整性、可扩展性、开放性和版本控制原则。

It adopts a uniform structure and relies on open formats: MiniZinc, CommonMark, and JSON. It provides multiple models per problem, tens of instances per model, and thousands of solutions and non-solutions in both integer and continuous domains, alongside natural-language descriptions to support text-to-model methods.

它采用了统一的结构，并依赖于开放格式：MiniZinc、CommonMark 和 JSON。它为每个问题提供了多个模型，为每个模型提供了数十个实例，并在整数和连续域中提供了数以千计的解与非解，同时还附带了自然语言描述，以支持文本到模型（text-to-model）的相关方法。