Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps

Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps

双视图累积作为混合采集高斯溅射的主要训练杠杆:关于梯度手术何时有效的方差分解视角

Abstract: Hybrid-capture novel view synthesis combines images at substantially different camera distances (e.g., aerial drone and ground-level views). Standard 3D Gaussian Splatting (3DGS), trained for 30K iterations with one rendered view per optimizer step, under-fits the minority regime by 1-3 dB on five hybrid-capture benchmarks. 摘要: 混合采集(Hybrid-capture)新视角合成结合了来自显著不同相机距离的图像(例如,空中无人机视角和地面视角)。标准的 3D 高斯溅射(3DGS)在 30K 次迭代训练中,每步优化器仅渲染一个视图,在五个混合采集基准测试中,对少数派数据分布的拟合度低了 1-3 dB。

We isolate the lever that closes this gap. Among compute-matched alternatives — vanilla 60K iterations, magnitude corrections (GradNorm), direction-aware near/far gradient surgery, projective preconditioning, confidence-gated sample-level surgery, and a random two-view-per-step control — the simplest structural change wins: rendering two views per optimizer step. 我们找出了缩小这一差距的关键杠杆。在计算量匹配的替代方案中——包括常规 60K 次迭代、幅度校正(GradNorm)、方向感知近/远梯度手术、投影预处理、置信度门控样本级手术,以及每步随机双视图对照组——最简单的结构性改变胜出:即在每个优化器步骤中渲染两个视图。

The pairing rule (geometry-defined near/far, random, or active loss-disparity) does not change PSNR beyond seed variance on any of the five scenes; the structural change of having two views per step does. We propose a variance-decomposition framework that predicts and explains this finding: under bimodal camera regimes, between-regime gradient variance turns out to be small relative to within-regime variance in 3DGS, so structured and random pairings are variance-equivalent in expectation, and the variance halving from two-view accumulation itself is the dominant effect. 配对规则(基于几何的近/远配对、随机配对或主动损失差异配对)在五个场景中均未引起超出随机种子方差范围的 PSNR 变化;而每步渲染两个视图的结构性改变则产生了显著影响。我们提出了一个方差分解框架来预测并解释这一发现:在双峰相机分布模式下,3DGS 中不同模式间的梯度方差相对于模式内部的方差较小,因此结构化配对和随机配对在期望上是方差等价的,而双视图累积带来的方差减半才是起主导作用的效果。

We verify the framework on five scenes whose camera-altitude bimodality coefficients span [0.55, 1.00], and we report the negative result that direction-aware projection, magnitude correction, confidence gating, and an active loss-disparity pairing all fall within seed variance of random two-view pairing. The two-view structural lever transfers cleanly to the Scaffold-GS and Pixel-GS backbones. 我们在五个相机高度双峰系数跨度为 [0.55, 1.00] 的场景上验证了该框架,并报告了一个负面结果:方向感知投影、幅度校正、置信度门控以及主动损失差异配对的效果均处于随机双视图配对的种子方差范围内。这种双视图结构杠杆可以清晰地迁移到 Scaffold-GS 和 Pixel-GS 主干网络中。

We position this work as an honest characterization of which training-side axes do and do not move PSNR for hybrid-capture 3DGS, together with the framework that explains why. 我们将这项工作定位为对混合采集 3DGS 训练侧哪些因素能提升 PSNR、哪些不能的诚实表征,并提供了相应的解释框架。