EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

Qibo Qiu; Shun Zhang; Haiming Gao; Honghui Yang; Haochao Ying; Wenxiao Wang; Xiaofei He

EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

Qibo Qiu, Shun Zhang, Haiming Gao, Honghui Yang, Haochao Ying, Wenxiao Wang, Xiaofei He

Published: 25 Sept 2024, Last Modified: 24 Dec 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Visual Foundation Model, Visual Place Recognition, Parameter Efficiency Fine-Tuning

TL;DR: This paper proposes a novel and effective Parameter Efficiency Fine-Tuning (PEFT) pipeline of adapting a visual foundation model in the visual place recognition task.

Abstract: Visual Place Recognition (VPR) is essential for mobile robots as it enables them to retrieve images from a database closest to their current location. The progress of Visual Foundation Models (VFMs) has significantly advanced VPR by capturing representative descriptors in images. However, existing fine-tuning efforts for VFMs often overlook the crucial role of probing in effectively adapting these descriptors for improved image representation. In this paper, we propose the Centroid-Free Probing (CFP) stage, making novel use of second-order features for more effective use of descriptors from VFMs. Moreover, to control the preservation of task-specific information adaptively based on the context of the VPR, we introduce the Dynamic Power Normalization (DPN) module in both the recalibration and CFP stages, forming a novel Parameter Efficiency Fine-Tuning (PEFT) pipeline (EMVP) tailored for the VPR task. Extensive experiments demonstrate the superiority of the proposed CFP over existing probing methods. Moreover, the EMVP pipeline can further enhance fine-tuning performance in terms of accuracy and efficiency. Specifically, it achieves 93.9\%, 96.5\%, and 94.6\% Recall@1 on the MSLS Validation, Pitts250k-test, and SPED datasets, respectively, while saving 64.3\% of trainable parameters compared with the existing SOTA PEFT method.

Primary Area: Robotics

Submission Number: 9188

Loading