Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

形状、对称性与结构：数学在机器学习研究中不断变化的角色

What is the Role of Mathematics in Modern Machine Learning? 数学在现代机器学习中扮演什么角色？

The past decade has witnessed a shift in how progress is made in machine learning. Research involving carefully designed and mathematically principled architectures result in only marginal improvements while compute-intensive and engineering-first efforts that scale to ever larger training sets and model parameter counts result in remarkable new capabilities unpredicted by existing theory. Mathematics and statistics, once the primary guides of machine learning research, now struggle to provide immediate insight into the latest breakthroughs. This is not the first time that empirical progress in machine learning has outpaced more theory-motivated approaches, yet the magnitude of recent advances has forced us to swallow the bitter pill of the “Bitter Lesson” yet again [1]. 过去十年见证了机器学习研究进展方式的转变。涉及精心设计和具有数学原理的架构的研究，往往只能带来边际改进；而那些以计算密集型和工程优先为导向、通过扩展至更大规模训练集和模型参数量的努力，却产生了现有理论无法预测的惊人新能力。数学和统计学曾经是机器学习研究的主要指南，如今却难以对最新的突破提供即时的洞察。这并非机器学习领域的经验性进展首次超越理论驱动的方法，但近期进展的规模再次迫使我们吞下“苦涩的教训”（Bitter Lesson）这一苦果 [1]。

This shift has prompted speculation about mathematics’ diminished role in machine learning research moving forward. It is already evident that mathematics will have to share the stage with a broader range of perspectives (for instance, biology which has deep experience drawing conclusions about irreducibly complex systems or the social sciences as AI is integrated ever more deeply into society). The increasingly interdisciplinary nature of machine learning should be welcomed as a positive development by all researchers. 这种转变引发了人们对数学在未来机器学习研究中作用减弱的猜测。显而易见，数学将不得不与更广泛的视角共享舞台（例如，在处理不可约复杂系统方面拥有深厚经验的生物学，或者随着人工智能日益深入社会而涉及的社会科学）。机器学习日益跨学科的性质，应当被所有研究人员视为一种积极的发展。

However, we argue that mathematics remains as relevant as ever; its role is simply evolving. For example, whereas mathematics might once have primarily provided theoretical guarantees on model performance, it may soon be more commonly used for post-hoc explanations of empirical phenomena observed in model training and performance–a role analogous to one that it plays in physics. Similarly, while mathematical intuition might once have guided the design of handcrafted features or architectural details at a granular level, its use may shift to higher-level design choices such as matching architecture to underlying task structure or data symmetries. 然而，我们认为数学依然像以往一样重要；其角色只是在演变。例如，数学过去可能主要为模型性能提供理论保证，但未来它可能更多地用于对模型训练和性能中观察到的经验现象进行事后解释——这类似于它在物理学中所扮演的角色。同样，虽然数学直觉过去可能指导了手工特征或架构细节的细粒度设计，但其应用可能会转向更高层面的设计选择，例如将架构与底层任务结构或数据对称性相匹配。

None of this is completely new. Mathematics has always served multiple purposes in machine learning. After all, the translation equivariant convolutional neural network, which exemplifies the idea of architecture matching data symmetries mentioned above is now over 40 years old. What’s changing are the kinds of problems where mathematics will have the greatest impact and the ways it will most commonly be applied. 这些并非完全新鲜事物。数学在机器学习中一直服务于多种目的。毕竟，平移等变卷积神经网络（translation equivariant convolutional neural network）——即上述架构匹配数据对称性理念的典范——距今已有 40 多年的历史。正在改变的是数学将产生最大影响的问题类型，以及它最常用的应用方式。

An intriguing consequence of the shift towards scale is that it has broadened the scope of the fields of mathematics applicable to machine learning. “Pure” mathematical domains such as topology, algebra, and geometry, are now joining the more traditionally applied fields of probability theory, analysis, and linear algebra. These pure fields have grown and developed over the last century to handle high levels of abstraction and complexity, helping mathematicians make discoveries about spaces, algebraic objects, and combinatorial processes that at first glance seem beyond human intuition. These capabilities promise to address many of the biggest challenges in modern deep learning. In this article we will explore several areas of current research that demonstrate the enduring ability of mathematics to guide the process of discovery and understanding in machine learning. 向规模化转变的一个有趣后果是，它拓宽了适用于机器学习的数学领域范围。拓扑学、代数学和几何学等“纯”数学领域，现在正加入到概率论、分析学和线性代数等更传统的应用领域中。这些纯数学领域在过去一个世纪中不断成长和发展，以处理高水平的抽象和复杂性，帮助数学家在空间、代数对象和组合过程方面做出发现，这些发现乍看之下似乎超出了人类的直觉。这些能力有望解决现代深度学习中的许多重大挑战。在本文中，我们将探讨当前研究的几个领域，这些领域展示了数学在引导机器学习发现和理解过程中的持久能力。

Figure 1: Mathematics can illuminate the ways that ReLU-based neural networks shatter input space into countless polygonal regions, in each of which the model behaves like a linear map [2, 3, 4]. These decompositions create beautiful patterns. (Figure made with SplineCam [5]). 图 1：数学可以阐明基于 ReLU 的神经网络如何将输入空间破碎成无数多边形区域，在每个区域中，模型表现得像一个线性映射 [2, 3, 4]。这些分解创造了美丽的图案。（图片由 SplineCam 制作 [5]）。

Describing an Elephant from a Pin Prick 从针尖大小的视角描述大象

Suppose you are given a 7 billion parameter neural network with 50 layers and are asked to analyze it; how would you begin? The standard procedure would be to calculate relevant performance statistics. For instance, the accuracy on a suite of evaluation benchmarks. In certain situations, this may be sufficient. However, deep learning models are complex and multifaceted. Two computer vision models with the same accuracy may have very different generalization properties to out-of-distribution data, calibration, adversarial robustness, and other “secondary statistics” that are critical in many real-world applications. Beyond this, all evidence suggests that to build a complete scientific understanding of deep learning, we will need to venture beyond evaluation scores. Indeed, just as it is impossible to capture all the dimensions of humanity with a single numerical quantity (e.g., IQ, height), trying to understand a model by one or even several statistics alone is fundamentally limiting. 假设你拿到一个拥有 50 层、70 亿参数的神经网络并被要求对其进行分析；你会从哪里开始？标准的程序是计算相关的性能统计数据。例如，在一套评估基准上的准确率。在某些情况下，这可能就足够了。然而，深度学习模型是复杂且多层面的。两个准确率相同的计算机视觉模型，在对分布外数据的泛化能力、校准性、对抗鲁棒性以及许多现实应用中至关重要的其他“次要统计指标”上，可能存在巨大差异。此外，所有证据都表明，要建立对深度学习的完整科学理解，我们需要超越评估分数。事实上，正如不可能用单一的数值（如智商、身高）来捕捉人类的所有维度一样，仅凭一个甚至几个统计指标来试图理解一个模型，在根本上是局限的。

One difference between understanding a human and understanding a model is that we have easy access to all model parameters and all the individual computations that occur in a model. Indeed, by extracting a model’s hidden activations we can directly trace the process by which a model converts raw input into a prediction. Unfortunately, the world of hidden activations is far less hospitable than that of simple model performance statistics. Like the initial input, hidden activations are usually high dimensional, but unlike input data they are not structured in a form that humans can understand. If we venture into even higher dimensions, we can try to understand a model through its weights directly. Here, in the space of model weights, we have the freedom to move in millions to billions of orthogonal directions from a single starting point. How do we even begin to make sense of these worlds? 理解人类与理解模型的一个区别在于，我们可以轻松获取模型的所有参数以及模型中发生的所有个体计算。事实上，通过提取模型的隐藏激活值，我们可以直接追踪模型将原始输入转换为预测的过程。不幸的是，隐藏激活值的世界远不如简单的模型性能统计数据那样友好。像初始输入一样，隐藏激活值通常是高维的，但与输入数据不同的是，它们并非以人类可以理解的形式构建。如果我们进入更高的维度，我们可以尝试直接通过权重来理解模型。在这里，在模型权重的空间中，我们有自由从一个起点向数百万到数十亿个正交方向移动。我们该如何开始理解这些世界呢？

There is a well-known fable in which three blind men each feel a different part of an elephant. The description that each gives of the animal is completely different, reflecting only the body part that that man felt. We argue that unlike the blind men who can at least use their hand to feel a substantial part of one of the elephant’s body parts, current methods of analyzing the hidden activations and weights of a model are akin to trying to describe the elephant from… 有一个著名的寓言，三个盲人各自摸了大象的不同部位。每个人对这种动物的描述完全不同，仅仅反映了他所摸到的那个身体部位。我们认为，与那些至少可以用手摸到大象身体部位实质部分的盲人不同，目前分析模型隐藏激活值和权重的方法，就像是试图从……（此处原文中断）