Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise

基于可预测虚拟噪声的随机梯度下降信息论泛化界

Abstract: Information-theoretic generalization bounds analyze stochastic optimization by relating expected generalization error to the mutual information between learned parameters and training data. Virtual perturbation analyses of SGD add auxiliary Gaussian noise only in the proof, making mutual information tractable while leaving the actual SGD trajectory unchanged. 摘要： 信息论泛化界通过将期望泛化误差与学习参数和训练数据之间的互信息联系起来，对随机优化进行分析。SGD 的虚拟扰动分析仅在证明过程中添加辅助高斯噪声，这使得互信息变得可处理，同时保持实际的 SGD 轨迹不变。

Existing bounds, however, typically require perturbation covariances to be fixed independently of the optimization history, limiting their ability to represent geometries induced by moving gradient statistics, preconditioners, curvature proxies, and other pathwise information. 然而，现有的界通常要求扰动协方差独立于优化历史固定，这限制了它们表示由移动梯度统计、预处理器、曲率代理和其他路径信息所诱导的几何结构的能力。

We introduce predictable history-adaptive virtual perturbations, where the perturbation covariance at each iteration may depend on the past real SGD history but not on current or future randomness. This predictability enables a conditional Gaussian relative-entropy argument and yields generalization bounds for SGD with adaptive virtual-noise geometry. 我们引入了可预测的历史自适应虚拟扰动，其中每次迭代的扰动协方差可以依赖于过去的真实 SGD 历史，但不能依赖于当前或未来的随机性。这种可预测性使得条件高斯相对熵论证成为可能，并为具有自适应虚拟噪声几何结构的 SGD 提供了泛化界。

The bounds replace fixed sensitivity and gradient-deviation terms with conditional adaptive counterparts, include an output-sensitivity penalty from accumulated perturbation covariance, and reduce the deviation term to a conditional variance only under conditional unbiasedness. 这些界用条件自适应项替换了固定的灵敏度和梯度偏差项，包含了来自累积扰动协方差的输出灵敏度惩罚，并在条件无偏性下将偏差项简化为条件方差。

Since adaptive covariances may be data-dependent, we separate local Gaussian smoothing from global reference-kernel comparison. The resulting bound includes a covariance-comparison cost measuring the KL price of using an admissible reference geometry different from the actual adaptive covariance. 由于自适应协方差可能依赖于数据，我们将局部高斯平滑与全局参考核比较分离开来。所得的界包含一个协方差比较成本，用于衡量使用与实际自适应协方差不同的容许参考几何结构所产生的 KL 散度代价。

Fixed-noise-style bounds are recovered under admissible synchronization, such as deterministic, public, or prefix-observable covariance rules. The framework recovers fixed isotropic and geometry-aware bounds as special cases while extending virtual perturbation analysis to history-dependent SGD without modifying the algorithm. 在容许同步（如确定性、公共或前缀可观测协方差规则）下，可以恢复固定噪声类型的界。该框架将固定的各向同性界和几何感知界作为特例恢复，同时在不修改算法的情况下，将虚拟扰动分析扩展到依赖于历史的 SGD。