A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning

自博弈强化学习中决策能力的结构性阈值决定了模型崩溃

Abstract: We show that a threshold in decision capacity determines whether self-play reinforcement learning agents collapse under asymmetric rule perturbations. Across poker variants, matrix games, a dice game, and multiple learning algorithms, eliminating all positive-reach contingent decisions causes rapid convergence to a deterministic exploitation attractor, a fixed point at near-maximal loss.

摘要： 我们研究发现，决策能力的一个阈值决定了自博弈（self-play）强化学习智能体在非对称规则扰动下是否会发生崩溃。通过对多种扑克变体、矩阵博弈、骰子游戏以及多种学习算法的测试，我们发现消除所有具有正可达性的条件决策（positive-reach contingent decisions）会导致模型迅速收敛到一个确定性的剥削吸引子（deterministic exploitation attractor），即一个接近最大损失的固定点。

Preserving even a single positive-reach contingent decision point prevents this collapse. A frozen baseline and fixed-opponent control confirm that the mechanism is co-adaptation under constraint, not the perturbation itself. The phenomenon is timing-invariant, fully reversible upon action restoration, and intensifies under function approximation.

只要保留哪怕一个具有正可达性的条件决策点，就能防止这种崩溃。通过冻结基线和固定对手的对照实验证实，其背后的机制是约束条件下的协同适应（co-adaptation），而非扰动本身。该现象具有时间不变性，在恢复动作后可完全逆转，并且在函数近似（function approximation）下会进一步加剧。

These results establish a sharp threshold at zero reach-weighted contingent action capacity, with severity scaling continuously via reach-weighted capacity in the tested domains.

这些结果确立了一个明确的阈值，即当“可达性加权条件动作容量”为零时，模型会发生崩溃；而在所测试的领域中，崩溃的严重程度会随着该容量的变化而连续缩放。