Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation
Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation
用于第一人称视角相机位姿估计的自适应测地线共形预测
Abstract: Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. 摘要: 用于增强现实(AR)和辅助设备的第一人称视角(Egocentric)位姿估计,不仅需要精确的预测,还需要提供有保证的不确定性区域。
Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) — a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. 共形预测(Conformal Prediction, CP)无需重新训练即可提供此类保证。然而,我们研究发现,使用单一固定阈值的标准 CP 虽然能达到名义上 90% 的总体覆盖率,但在最困难的 25% 帧(Q4)中仅能覆盖约 60% —— 在 EPIC-Fields 数据集上,针对 12 名参与者、3 种预测器和 3 个预测时域(共 108 次评估)的测试显示,存在约 30 个百分点的条件覆盖率差距。
We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. 我们进一步证明,与欧几里得评分相比,测地线 SE(3) 非一致性评分(geodesic SE(3) nonconformity score)能识别出物理上更困难的帧;测地线 Q4 帧与欧几里得 Q4 帧的重叠率仅为 15-26%,且其真实相机位移高出 2-3 倍。
To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target. 为了弥补这一覆盖率差距,我们提出了 DINOv2-Bridge 自适应 CP:这是一种两阶段难度估计器,仅需在单个源参与者的数据上进行训练,即可在测试时无需任何图像的情况下实现跨参与者迁移,将 Q4 覆盖率从约 0.75 提升至约 0.93,同时保持 90% 的总体覆盖率目标。