Abstract: Visual Place Recognition (VPR) techniques commonly utilize Contrastive Losses (CL) to train models that generate compact and discriminative global descriptors for images. These models often result in poor performance due to one of the following reasons during training: 1) loss functions that focus primarily on easier samples, 2) reliance on time-consuming hard sample mining methods to identify informative supervisory samples, which hinders effective learning from large-scale datasets. To enhance both learning efficiency and effectiveness, we propose a Curricular Contrastive Loss (CCL) and use graded similarity labels as a measure of sample difficulty. Inspired by human learning that begin with easier concepts and progressively tackle more challenging ones, our CCL dynamically emphasizes easier samples during the initial training stages to achieve rapid convergence. The learning gradually focuses on harder samples in later training stages to bolster robustness of the models under challenging conditions. Our proposed method has been extensively evaluated on popular datasets, and the results demonstrate its superior performance compared to the CL and Generalized CL functions.
Loading