Optimal Transport Aggregation for Visual Place Recognition

Published: 01 Jan 2024, Last Modified: 08 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The task of Visual Place Recognition (VPR) aims to match a query image against references from an extensive database of images from different places, relying solely on visual cues. State-of-the-art pipelines focus on the aggre-gation offeatures extractedfrom a deep backbone, in order to form a global descriptor for each image. In this con-text, we introduce SALAD (Sinkhorn Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem. In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also in-troduce a ‘dustbin’ cluster, designed to selectively discard features deemed non-informative, enhancing the overall de-scriptor quality. Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides enhanced description power for the local features, and dramatically reduces the required training time. As a result, our single-stage method not only surpasses single-stage baselines in pub-lic VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost. Code and models are available at https://github.com/serizbalsalad.
Loading