Elo-Disentangled Player-Style Embeddings for Human Chess via Rating-Conditioned Residual Move Model

通过等级分条件残差走法模型实现人类国际象棋的 Elo 解耦棋手风格嵌入

Abstract: We study representation learning for individual human chess style: a per-player embedding learned from a player’s move history such that inner products measure stylistic similarity, while being approximately disentangled from playing strength (Elo).

摘要： 我们研究了人类国际象棋个人风格的表征学习：通过学习棋手的走法历史，为每位棋手生成一个嵌入向量，使得向量间的内积能够衡量风格相似度，同时与棋手的竞技水平（Elo 等级分）实现近似解耦。

Our key design is a residual formulation: a rating-conditioned base move model (Maia-3 policy logits plus Stockfish-derived features, scored over Maia-2-proposed candidates) captures what a typical player of a given strength would play, and a frozen copy of it anchors a learned move encoder and a per-player vector z, so that z explains only deviations from rating-typical play.

我们的核心设计采用了一种残差公式：一个基于等级分条件的基准走法模型（结合了 Maia-3 策略逻辑值与 Stockfish 衍生特征，并针对 Maia-2 提出的候选走法进行评分）用于捕捉特定水平的典型棋手会如何走棋。通过冻结该模型作为锚点，结合一个学习型的走法编码器和每个棋手的向量 z，使得 z 仅用于解释偏离等级分典型水平的走法差异。

The base model improves move prediction over the strong Maia-3 policy by 27-37% relative NLL across the rating spectrum, with the largest gains at the top (2800+); Stockfish’s marginal value grows monotonically with Elo (negligible at 900-1200, +0.085 nats at 2800+).

在整个等级分区间内，该基准模型在走法预测上较强大的 Maia-3 策略提升了 27-37% 的相对负对数似然（NLL），且在顶尖水平（2800+）的提升最为显著；Stockfish 的边际价值随 Elo 等级分的增加而单调增长（在 900-1200 分段几乎可忽略，在 2800+ 分段则达到 +0.085 nats）。

On a shared Elo-stratified benchmark of 22,620 held-out decisions, top-1 move-matching rises monotonically from Maia-2 to Maia-3 to the Stockfish-augmented base (0.51 -> 0.57 -> 0.68): the base is +33% relative top-1 over Maia-2 and +19% over Maia-3 (30% lower NLL), with the engine-feature lift largest at high Elo.

在一个包含 22,620 个留存决策的 Elo 分层基准测试中，Top-1 走法匹配率从 Maia-2 到 Maia-3 再到增强版基准模型呈现单调上升趋势（0.51 -> 0.57 -> 0.68）：基准模型较 Maia-2 的 Top-1 相对提升了 33%，较 Maia-3 提升了 19%（NLL 降低了 30%），其中引擎特征带来的提升在高 Elo 分段最为明显。

The player embedding adds little to raw move-matching on top of this base — its marginal top-1 gain falls within the 95% confidence interval — and its value is instead representational: z generalizes to held-out decisions without overfitting, re-identifies players from disjoint games above chance, and a linear probe recovers rating from z with only R^2 = 0.06 (no better nonlinearly), evidence it captures style on an Elo-orthogonal axis.

棋手嵌入向量在上述基准之上对原始走法匹配的贡献微乎其微——其边际 Top-1 增益处于 95% 置信区间内——但其价值在于表征能力：z 在留存决策上具有良好的泛化性且不会过拟合，能够以高于随机的概率从不相关的对局中重新识别棋手，且通过线性探测从 z 中恢复等级分的 R^2 仅为 0.06（非线性探测亦无改善），这证明了它是在与 Elo 正交的维度上捕捉风格。

We argue that a strong rating-conditioned base plus a compact, Elo-disentangled embedding — separating typical play from individual deviation — is an economical, interpretable model of individual style, an alternative to per-player preference fine-tuning.

我们认为，一个强大的等级分条件基准模型加上紧凑的、与 Elo 解耦的嵌入向量——将典型走法与个人偏差分离开来——是一种经济且可解释的个人风格建模方式，是针对每位棋手进行偏好微调之外的另一种选择。