Lost in the Tail: Addressing Geographic Imbalance in Urban Visual Place Recognition

迷失在长尾中：解决城市视觉地点识别中的地理不平衡问题

Urban-scale Visual Place Recognition (VPR) aims to identify the geographic location of a query image by matching it against a geo-tagged database. While recent methods achieve impressive performance, they overlook a serious long-tailed problem hidden in urban-scale datasets, which biases the model towards locations with abundant images and ignores less-visited areas, causing models to systematically favor frequently photographed locations while failing in sparsely covered areas.

城市级视觉地点识别（VPR）旨在通过将查询图像与带有地理标签的数据库进行匹配，来确定其地理位置。尽管近期的方法取得了令人瞩目的性能，但它们忽略了隐藏在城市级数据集中的严重长尾问题。该问题导致模型偏向于图像丰富的地点，而忽视了访问量较少的区域，使得模型系统性地偏好频繁拍摄的地点，却在覆盖稀疏的区域表现不佳。

In this paper, we systematically characterize this imbalance challenge and propose Distribution-Aware Place Recognition (DAPR), a model-agnostic plug-in framework that rebalances gradient contributions across head and tail classes. Additionally, within classification-retrieval pipelines, DAPR applies a multi-scale distance search mechanism to compute per-class distributional compactness, providing complementary gains at the retrieval stage.

在本文中，我们系统地刻画了这一不平衡挑战，并提出了“分布感知地点识别”（Distribution-Aware Place Recognition, DAPR）。这是一个与模型无关的插件框架，能够重新平衡头部类别和尾部类别之间的梯度贡献。此外，在分类-检索流水线中，DAPR 应用了一种多尺度距离搜索机制来计算各类的分布紧凑度，从而在检索阶段提供互补的性能增益。

On the large-scale SF-XL benchmark, our framework outperforms the previous classification-retrieval baseline by 18.3% on test set v1, and 6.7% on test set v2. As a plug-in module, it achieves consistent improvements across representative VPR methods on SF-XL, MSLS, and Pitts30k, demonstrating broad generalizability across different methods and benchmarks.

在大型 SF-XL 基准测试中，我们的框架在测试集 v1 上比之前的分类-检索基线提高了 18.3%，在测试集 v2 上提高了 6.7%。作为一个插件模块，它在 SF-XL、MSLS 和 Pitts30k 等代表性 VPR 方法上均实现了持续的性能提升，证明了其在不同方法和基准测试中具有广泛的通用性。