Inside soccer’s data renaissance
Inside soccer’s data renaissance
足球数据复兴内幕
Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally send the ball all the way down the pitch and right out of bounds on the opponent’s end. Casual fans might scratch their heads. Where’s the logic in surrendering possession seconds into a game? If you were Jesse Davis, though, you’d know that this play could be a prime setup to score. 想象一下,当你打开电视观看世界杯开球时,看到一名球员故意将球长传到底线,直接踢出对方半场的界外。普通球迷可能会感到困惑:比赛刚开始几秒钟就放弃球权,这有什么逻辑可言?但如果你是杰西·戴维斯(Jesse Davis),你就会明白,这可能是一个绝佳的得分布局。
Davis is a professor of computer science at KU Leuven in Belgium and head of its Sports Analytics Lab, which has been at the vanguard of a data awakening in soccer since its inception more than a decade ago. Though the research group brings machine-learning models to bear on a variety of sports—including basketball, volleyball, and field hockey—nowhere is its impact felt more than on the soccer pitch. 戴维斯是比利时鲁汶大学(KU Leuven)的计算机科学教授,也是该校体育分析实验室的负责人。自十多年前成立以来,该实验室一直处于足球数据觉醒的前沿。尽管该研究小组将机器学习模型应用于包括篮球、排球和曲棍球在内的多种运动,但其影响力在足球场上体现得最为显著。
Davis and his team of researchers employ advanced data analytics to reveal a range of (beg your pardon) game-changing findings that are shifting pro clubs’ decision-making. “His lab is the most influential sports analytics lab in soccer,” says Hugo Rios-Neto, data recruitment lead for Royal Sporting Club Anderlecht in Belgium. They’ve helped teams better evaluate their rosters, conceived ways to assess how efficient (or not) strategies are, and developed algorithms that uncover hidden tactical patterns. 戴维斯和他的研究团队利用先进的数据分析技术,揭示了一系列足以改变比赛结果的发现,这些发现正在重塑职业俱乐部的决策方式。比利时安德莱赫特皇家体育俱乐部(Royal Sporting Club Anderlecht)的数据招聘主管雨果·里奥斯-内托(Hugo Rios-Neto)表示:“他的实验室是足球界最具影响力的体育分析实验室。”他们帮助球队更好地评估阵容,构思评估策略效率的方法,并开发出能够揭示隐藏战术模式的算法。
Like, for instance, the value of kicking the ball out of bounds close to the goal and letting your opponent throw it back into play—a move that’s been popping up in some of the world’s top leagues over the last few years. To make the statistical argument for this seemingly counterproductive move, Davis’s group built a training data set composed of more than 1.4 million passes and some 60,000 throw-ins—partly from the 2022 World Cup. They used tree ensemble models (essentially a mashup of decision trees) to simulate the tactic. 例如,将球踢出靠近对方球门的界外,让对手掷界外球重新开球——这种看似适得其反的战术在过去几年里已出现在一些世界顶级联赛中。为了从统计学角度论证这一战术,戴维斯的研究小组建立了一个包含超过140万次传球和约6万次界外球的训练数据集(部分来自2022年世界杯)。他们利用树集成模型(本质上是决策树的组合)来模拟这一战术。
The conclusion, which the researchers presented in a 2024 paper under the apt title “Boot it”: When the ball is in the middle third of the pitch, kicking it out of bounds on your opponents’ side of the field can put you within 10 actions (think passes and dribbles) of a goal. That can be a big deal in a game that has 1,500 or more actions per match and very little scoring. The idea, Davis explains, is that you’re setting yourself up to recover the ball in an advantageous situation. 研究人员在2024年发表的一篇题为《开大脚》(Boot it)的论文中得出了结论:当球处于球场中段时,将其踢出对方半场的界外,可以让你在10次动作(如传球和盘带)内获得射门机会。在一场每场比赛有1500次以上动作但进球寥寥的比赛中,这意义重大。戴维斯解释说,其核心逻辑在于,你是在为自己在有利的情况下重新夺回球权做准备。
Beyond providing discrete game-day insights, Davis also occupies a unique niche in the world of sports analytics, where many clubs now hire their own internal data teams to maintain a competitive edge. He makes most of his research freely available via open-source analytics tools, but the academic life also affords him the freedom to tackle more complex problems—like standardizing in-game data, a project that will make it easier to parse game footage and come up with winning strategies. 除了提供具体的比赛日洞察外,戴维斯在体育分析领域也占据着独特的地位。如今,许多俱乐部为了保持竞争优势,都会聘请自己的内部数据团队。他通过开源分析工具免费提供大部分研究成果,而学术生活也让他有自由去解决更复杂的问题——例如标准化比赛数据,这一项目将使解析比赛录像和制定制胜策略变得更加容易。
Davis, 45, grew up in Wisconsin and spent his childhood enraptured by basketball and (American) football. Soccer was largely a nonentity to him until college, when the 2002 World Cup—in which Brazil famously swept the tournament—reeled him in. But the notion of going on to dissect the sport never crossed his mind. His doctoral studies in computer science at the University of Wisconsin–Madison had him working with radiologists to analyze mammography reports. 45岁的戴维斯在威斯康星州长大,童年时期沉迷于篮球和美式橄榄球。在大学之前,足球对他来说几乎是个空白,直到2002年世界杯——巴西队在那届比赛中大获全胜——才让他迷上了这项运动。但他从未想过要去深入剖析这项运动。他在威斯康星大学麦迪逊分校攻读计算机科学博士学位时,工作内容是与放射科医生合作分析乳房X光检查报告。
In October 2010, he joined KU Leuven as a computer science professor looking at the intersection of AI and health care, with a focus on monitoring athletic performance. His research team studied, for instance, combining things like heart rate with other metrics to determine whether someone was overtraining. They also dove into the biomechanics of running. 2010年10月,他加入鲁汶大学担任计算机科学教授,研究人工智能与医疗保健的交叉领域,重点是监测运动表现。例如,他的研究团队曾研究如何将心率等指标与其他数据结合,以判断运动员是否过度训练。他们还深入研究了跑步的生物力学。
The tactical and technical aspects of sports, and soccer specifically, became the subject of Davis’s professorial work when he hired Jan Van Haaren, an engineering student focused on artificial intelligence and a self-described soccer fanatic. He wondered if data analysis could be used to study things like passing, shooting, and ball progression—metrics the game was only just beginning to digitally crunch at the time. 当戴维斯聘请了专注于人工智能的工程系学生、自称“足球狂热分子”的扬·范哈伦(Jan Van Haaren)后,体育(特别是足球)的战术和技术层面成为了他教授工作的主题。他想知道数据分析是否可以用来研究传球、射门和球的推进等指标——在当时,足球界才刚刚开始对这些指标进行数字化处理。
Davis realized that machine learning and other artificial-intelligence tools lent themselves well to the complexity, fluidity, and speed of soccer. You need not be well versed in the moneyball-ization of pro sports to see that it’s relatively easy to apply deep statistical work to baseball or basketball. You can isolate actions like jump shots and assign value to ones taken close or far away. 戴维斯意识到,机器学习和其他人工智能工具非常适合足球运动的复杂性、流动性和速度。你不需要精通职业体育的“魔球理论”(Moneyball-ization),也能看出将深度统计工作应用于棒球或篮球相对容易。你可以将跳投等动作单独列出,并为远距离或近距离的投篮分配价值。
Soon a basketball coach realizes that a player who can’t make a layup, but shoots roughly as well from the three-point line as on mid-range jumpers, might as well go for the shot that gets more points. Soccer, by comparison, seemed like a poor candidate for that kind of analysis. “The vast, vast majority of actions really don’t lead to the outcome of a goal or even a shot,” says Rios-Neto. “So it’s hard to elaborate or derive a winning strategy from the data.” 很快,篮球教练就会意识到,如果一名球员上篮不准,但三分球和中投命中率差不多,那么他不如直接投三分以获得更多分数。相比之下,足球似乎并不适合这种分析。里奥斯-内托说:“绝大多数动作并不会直接导致进球甚至射门。因此,很难从数据中推导出制胜策略。”
But Van Haaren’s love of the sport, and Davis’s love of sports in general, inspired them to try. Over time, Davis realized that machine learning and other artificial-intelligence tools lent themselves well to the complexity, fluidity, and speed of soccer. In 2014, he officially stood up the Sports Analytics Lab. 但范哈伦对足球的热爱,以及戴维斯对体育运动的热爱,激励他们进行了尝试。随着时间的推移,戴维斯意识到机器学习和其他人工智能工具非常适合足球的复杂性、流动性和速度。2014年,他正式成立了体育分析实验室。
With a stable of about 10 students and postdocs at any one time, the lab began laying what Van Haaren calls the “intellectual foundations of how the game is analyzed today.” The researchers picked apart in-game actions, and suddenly they were valuing ball possession, penalty-kick strategy (aim for the center), and the merits of long shots on goal (take them). “One of the trends that’s been in soccer over the last five to 10 years is that the number of long shots has dramatically increased,” says Davis. “What the data let you do is really quantify what the probabilities of those things are.” 实验室常驻约10名学生和博士后,开始奠定范哈伦所说的“当今足球比赛分析的知识基础”。研究人员对比赛中的动作进行了拆解,他们开始评估控球权、点球策略(瞄准中心)以及远射的价值(鼓励远射)。戴维斯说:“过去五到十年足球界的一个趋势是,远射次数大幅增加。数据的作用在于,它能让你真正量化这些动作的概率。”