I Built 11 Models to Predict the 2026 World Cup. They Crown Four Different Champions.

I Built 11 Models to Predict the 2026 World Cup. They Crown Four Different Champions.

我构建了 11 个模型来预测 2026 年世界杯,它们得出了四个不同的冠军。

Data Science: A single model hands you a single answer and no sense of how much it hinges on the dozens of choices buried inside it. 数据科学:单一模型只会给你一个单一的答案,让你无法感知这个结果在多大程度上取决于其内部隐藏的数十个选择。

Ari Joury, PhD | Jun 15, 2026 | 11 min read Ari Joury 博士 | 2026 年 6 月 15 日 | 阅读需 11 分钟

Football isn’t easy to predict, even with world-class data. There are 48 teams in the 2026 World Cup, 104 matches, and roughly as many confident predictions as there are fans. Building a model that announces “Team X wins, probability p” is easy — an afternoon’s work with public data and a Poisson distribution. The trap is believing the number. 足球比赛很难预测,即使拥有世界级的数据也是如此。2026 年世界杯有 48 支球队、104 场比赛,而自信的预测几乎和球迷一样多。构建一个宣布“X 队获胜,概率为 p”的模型很容易——利用公开数据和泊松分布,一个下午就能完成。陷阱在于相信这个数字。

A single model hands you a single answer and no sense of how much it hinges on the dozens of choices buried inside it: which rating system, which goal distribution, which learning algorithm. Change any one of them and the “answer” can move by double digits. 单一模型只会给你一个单一的答案,让你无法感知这个结果在多大程度上取决于其内部隐藏的数十个选择:使用哪种评分系统、哪种进球分布、哪种学习算法。改变其中任何一个,最终的“答案”都可能产生两位数的波动。

So instead of trusting one model, I built eleven — one for (almost) every chapter of a machine-learning textbook — trained or computed them all on the same real match data, ran each through the same tournament simulator, and let them argue. Three rating systems (Elo, Colley, PageRank), two goal models (Poisson, Negative Binomial), five classifiers (logistic regression, KNN, random forest, XGBoost, a neural network), and the betting market as a benchmark. 因此,我没有盲目信任一个模型,而是构建了十一个——几乎涵盖了机器学习教科书中的每一章——用相同的真实比赛数据对它们进行训练或计算,让它们通过同一个锦标赛模拟器运行,并让它们“争论”。其中包括三种评分系统(Elo、Colley、PageRank)、两种进球模型(泊松分布、负二项分布)、五种分类器(逻辑回归、KNN、随机森林、XGBoost、神经网络),并以博彩市场作为基准。

Same 48 teams, same data, eleven methods. They crown four different champions — and that disagreement, not the consensus, turns out to be the most useful thing a suite of models can give you. This article is about how to build it and how to read it. (If you just want a single clean forecast, the Elo-plus-Poisson version is its own short article; here we’re after something more honest than one number.) 同样的 48 支球队,同样的数据,十一种方法。它们得出了四个不同的冠军——事实证明,这种分歧(而非共识)才是模型组合能提供给你的最有价值的东西。本文将介绍如何构建这些模型以及如何解读它们。(如果你只是想要一个简洁的预测,Elo 加泊松分布的版本有单独的短文;而在这里,我们追求的是比单一数字更诚实的结果。)

The data

数据来源

Everything is fit on 358 real international matches: every game from the 2010–2022 World Cups (256 matches) plus the 2020 and 2024 European Championships (102), pulled from the openfootball project — specifically its worldcup.json and euro.json datasets, which are dedicated to the public domain. The classifiers learn a mapping from match features to results on these games; the rating systems are computed directly from the results graph. 所有模型均基于 358 场真实的国际比赛进行拟合:包括 2010 年至 2022 年世界杯的所有比赛(256 场)以及 2020 年和 2024 年欧洲杯(102 场),数据来自 openfootball 项目——特别是其 worldcup.json 和 euro.json 数据集,这些数据集已贡献给公共领域。分类器学习从比赛特征到比赛结果的映射;评分系统则直接根据结果图计算得出。

The field is the real, confirmed 2026 draw — 48 teams, 12 groups. Three features describe each match from the “home” (first-named, neutral-venue) team’s perspective: the strength gap between the teams, their combined strength, and a knockout flag. The target is the three-way result (win / draw / loss). 参赛阵容是 2026 年真实确认的抽签结果——48 支球队,12 个小组。从“主队”(首列名、中立场地)的角度来看,每个比赛由三个特征描述:球队之间的实力差距、综合实力以及淘汰赛标志。目标是三向结果(胜/平/负)。

One interface, eleven engines

一个接口,十一个引擎

The only way to race different model families fairly is to force them through the same contract: given two teams, return P(win), P(draw), P(loss) plus an expected goal difference for group-stage tiebreakers. Everything downstream is identical across models: the 12 groups, the best-third qualification, and the 32-team knockout. The simulator is even vectorized so that all 20,000 tournaments per model run as NumPy array operations rather than Python loops. 要公平地对比不同的模型系列,唯一的方法是强制它们遵循相同的契约:给定两支球队,返回胜、平、负的概率,以及用于小组赛决胜局的预期净胜球。模型后续的所有流程都是相同的:12 个小组、成绩最好的第三名晋级规则以及 32 强淘汰赛。模拟器甚至进行了向量化处理,使得每个模型的 20,000 场锦标赛模拟都作为 NumPy 数组操作运行,而不是使用 Python 循环。

The rating models: strength from results

评分模型:基于结果的实力评估

Elo is the one most people have heard of — the chess rating, adapted for football: a self-correcting number updated after each match by R’ = R + K(S − E), where S is the actual result and the win expectancy is E = 1/(1 + 10^(−Δ/400)) for a rating gap Δ. To get a match probability we run the Elo gap through that logistic curve and split off a draw probability fit separately. Elo 是大多数人都听说过的评分系统——最初用于国际象棋,后被改编用于足球:这是一个自我修正的数字,在每场比赛后通过 R’ = R + K(S − E) 进行更新,其中 S 是实际结果,对于评分差距 Δ,获胜预期 E = 1/(1 + 10^(−Δ/400))。为了获得比赛概率,我们将 Elo 差距代入该逻辑曲线,并单独拟合出平局概率。

Colley ratings drop the temporal updating entirely and solve a single linear system. The +2 on the diagonal is a Laplace-style prior that makes the system strictly diagonally dominant and therefore always solvable — every team is implicitly seeded at 0.5 before any games. Colley is elegant precisely because it has no free parameters and no notion of “current form”: it’s a pure, closed-form summary of who beat whom. Colley 评分完全放弃了时间更新,而是求解一个单一的线性系统。对角线上的 +2 是一个拉普拉斯式的先验,使系统严格对角占优,从而始终可解——在任何比赛开始前,每支球队都被隐式设定为 0.5 的初始值。Colley 的优雅之处恰恰在于它没有自由参数,也没有“当前状态”的概念:它是对“谁击败了谁”的一种纯粹的闭式总结。

PageRank treats the season as a directed graph. Every match adds weight to an edge from the loser to the winner (draws split the weight both ways), so pointing at a team is an endorsement. A team scores highly if strong teams “point to” it — i.e., lost to it. It’s the same algorithm Google used to rank web pages, applied to football results. PageRank 将赛季视为一个有向图。每场比赛都会增加从输家到赢家的一条边的权重(平局则将权重平分),因此指向一支球队就代表一种“背书”。如果强队“指向”某支球队(即输给了它),那么该球队的得分就会很高。这与谷歌用于网页排名的算法相同,只是被应用到了足球比赛结果中。

The goal models: Poisson and Negative Binomial

进球模型:泊松分布与负二项分布

These model scorelines, not outcomes. I fit a Poisson GLM with a log link on the real match goals, stacking each match as two observations (each team’s goals vs. its signed strength gap). From λ_home, λ_away we recover P(W/D/L) by forming the outer product of the two teams’ Poisson goal distributions and summing the cells where the home team scores more, fewer, or equal goals. 这些模型模拟的是比分,而不是比赛结果。我使用对数链接的泊松广义线性模型(GLM)对真实比赛进球数进行拟合,将每场比赛堆叠为两个观测值(每支球队的进球数对比其带符号的实力差距)。通过 λ_主队 和 λ_客队,我们通过计算两队泊松进球分布的外积,并对主队进球更多、更少或相等的情况进行求和,从而得出胜、平、负的概率。