A Deep Reinforcement Learning (DRL)-Based Transformer Method for Solving the Open Shop Scheduling Problem

基于深度强化学习（DRL）的 Transformer 方法求解开放车间调度问题

Abstract: The open shop scheduling problem (OSSP) arises in many industrial and service settings but remains computationally challenging as the number of jobs and machines increases. While exact methods quickly become intractable, classical dispatching rules and metaheuristics may require substantial tuning to maintain solution quality at large scales.

摘要： 开放车间调度问题（OSSP）广泛存在于许多工业和服务场景中，但随着工件和机器数量的增加，其计算难度依然巨大。虽然精确算法很快会变得难以处理，但传统的调度规则和元启发式算法在处理大规模问题时，往往需要大量的参数调整才能维持解的质量。

This study develops a Transformer-based scheduling policy for OSSP using an encoder-decoder architecture with multi-head attention. The model is trained on Taillard benchmark instances (4x4, 5x5, 7x7, and 10x10) using only the processing-time matrix as input and produces feasible schedules with makespans typically within 15-30% of best-known values.

本研究开发了一种基于 Transformer 的 OSSP 调度策略，采用了带有注意力机制（Multi-head attention）的编码器-解码器架构。该模型在 Taillard 基准实例（4x4、5x5、7x7 和 10x10）上进行训练，仅以加工时间矩阵作为输入，生成的调度方案其完工时间（makespan）通常在已知最优值的 15-30% 以内。

To evaluate scalability, the trained policy is applied without retraining to randomly generated instances from 40x40 to 100x100 and compared against classical dispatching heuristics, including SPT, LPT, MWKR, and EST. Across these large instances, the Transformer achieved average gaps of 12.89-15.12% relative to a standard lower bound. Compared with EST, the Transformer remained competitive, typically within a modest margin, while substantially outperforming SPT and LPT.

为了评估可扩展性，研究人员将训练好的策略直接应用于 40x40 到 100x100 的随机生成实例中（无需重新训练），并与 SPT、LPT、MWKR 和 EST 等经典调度启发式算法进行了对比。在这些大规模实例中，Transformer 相对于标准下界的平均差距为 12.89-15.12%。与 EST 相比，Transformer 保持了较强的竞争力，差距通常较小，同时显著优于 SPT 和 LPT。

These results indicate that a Transformer policy trained on small OSSP instances can generalize to substantially larger problems and provide a feature-light, learning-based alternative to classical dispatching rules.

这些结果表明，在小型 OSSP 实例上训练的 Transformer 策略可以泛化到规模大得多的问题上，并为经典调度规则提供了一种轻量化、基于学习的替代方案。