Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game
Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game
量子青蛙:量化时间合作游戏中的涌现协作与难度扩展
Abstract: We introduce Quantum Frog, a two-player cooperative game built on a novel quantized-time mechanic in which the environment advances only when a player acts. Inspired by the classic arcade game Frogger, Quantum Frog requires two frogs to cross an 8×8 grid of traffic and reach the far side together. We use reinforcement learning (RL) as an analytical lens to answer four design questions: (1) how does game difficulty scale with traffic density, (2) what is the optimal single-agent policy and why, (3) how large is the cooperation gap between independent and cooperative two-agent play, and (4) what joint strategy emerges when agents are incentivised to cooperate?
摘要: 我们介绍了《量子青蛙》(Quantum Frog),这是一款基于新型“量化时间”机制的双人合作游戏,在该机制下,环境仅在玩家采取行动时才会推进。受经典街机游戏《青蛙过河》(Frogger)的启发,《量子青蛙》要求两只青蛙穿过 8×8 的交通网格并共同到达对岸。我们利用强化学习(RL)作为分析视角,回答了四个设计问题:(1)游戏难度如何随交通密度扩展;(2)最优单智能体策略是什么,原因何在;(3)独立博弈与合作博弈之间的协作差距有多大;(4)当智能体被激励进行合作时,会涌现出什么样的联合策略?
We train agents through five escalating stages, Tabular Q-Learning, Deep Q-Network (DQN), Independent DQN (IDQN), and Multi-Agent Proximal Policy Optimisation (MAPPO with a centralised critic), evaluating each against traffic densities of one to six cars.
我们通过五个递进阶段对智能体进行训练,包括表格 Q 学习(Tabular Q-Learning)、深度 Q 网络(DQN)、独立 DQN(IDQN)以及多智能体近端策略优化(MAPPO,采用集中式评价器),并针对一到六辆车的交通密度对每个阶段进行了评估。
Our key findings are: (i) the quantized-time mechanic makes a rush strategy (moving directly upward at every step) universally optimal, as time exposure to traffic is minimised; (ii) adding an uncoordinated second player is harder than sextupling the traffic for a single expert player; (iii) cooperative training recovers +32–34 percentage points of joint success rate relative to independent agents and reduces episode length from ~90 to ~6 steps; and (iv) the emergent cooperative strategy is synchronised rushing, not complex positional coordination, illustrating that shared incentives alone suffice to align agents in time-critical cooperative tasks.
我们的主要发现包括:(i)量化时间机制使得“冲刺策略”(每一步都直接向上移动)成为普遍的最优解,因为这最大限度地减少了暴露在交通中的时间;(ii)增加一个缺乏协调的第二玩家,其难度比将单人专家玩家的交通密度增加六倍还要大;(iii)合作训练相对于独立智能体,将联合成功率提高了 32–34 个百分点,并将单局时长从约 90 步缩短至约 6 步;(iv)涌现出的合作策略是同步冲刺,而非复杂的空间协调,这表明在时间紧迫的合作任务中,仅靠共享激励就足以使智能体达成一致。
These findings provide concrete, empirically grounded guidance for the commercial design of Quantum Frog and offer broader insights into the role of environment mechanics in shaping multi-agent learning dynamics.
这些发现为《量子青蛙》的商业设计提供了具体且基于实证的指导,并为环境机制在塑造多智能体学习动态中的作用提供了更广泛的见解。