The Overlooked Value of Test-time Reference Sets in Visual Place Recognition

Published: 22 Jul 2025, Last Modified: 19 Oct 2025ICCV 2025 Workshop CroCoDLEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Given a query image, Visual Place Recognition (VPR) is the task of retrieving an image of the same place from a reference database with robustness to viewpoint and appearance changes. Recent works show that some VPR benchmarks are solved by methods using Vision-Foundation-Model backbones and trained on large-scale and diverse VPR-specific datasets. Several benchmarks remain challenging, particularly when the test environments differ significantly from the usual VPR training datasets. We propose a complementary, unexplored source of information to bridge the train-test domain gap, which can further improve the performance of State-of-the-Art (SOTA) VPR methods on such challenging benchmarks. Concretely, we identify that the test-time reference set, the ``map'', contains images and poses of the target domain, and must be available before the test-time query is received in several VPR applications. Therefore, we propose to perform simple Reference-Set-Finetuning (RSF) of VPR models on the map, boosting the SOTA ($ \approx2.3\%$ increase on-average for Recall@1) on these challenging datasets. Finetuned models retain generalization, and RSF works across diverse test datasets.
Submission Number: 4
Loading