Abstract: Cross-view image geo-localization predicts the geographic location of a ground-view image by referencing geotagged satellite images. Existing methods commonly use a triplet loss function, which cannot fully exploit the large number of negative samples. While the InfoNCE loss can leverage multiple negative samples within a batch, it treats positive and negative samples equally, failing to emphasize the importance of positive samples. Additionally, when computing feature similarity, current methods consider only a single dimension, making it difficult to effectively mine hard negative samples with high similarity. To address these issues, this paper proposes a cross-view image geo-localization method based on weighted InfoNCE and multi-dimensional similarity measurement. In the InfoNCE, we assign an appropriate weight to the similarity score of positive samples, increasing their significance within the batch and guiding the model to learn the features of positive samples more effectively. During training, to better mine hard negative samples, we introduce a multi-dimensional similarity measurement method that integrates cosine similarity, Euclidean distance, and Manhattan distance, enabling a more comprehensive assessment of image similarity. Furthermore, we employ a weighted asymmetric bidirectional loss strategy to utilize the loss from both information streams. Experimental results demonstrate that our method achieves superior performance across multiple metrics on the CVUSA and CVACT datasets. Ablation studies further validate the effectiveness of the weighted InfoNCE, the bidirectional asymmetric loss strategy, and the multi-dimensional similarity measurement approach.
External IDs:dblp:conf/ijcnn/MaYZ25
Loading