Multi-scale attention encoder for street-to-aerial image geo-localization

Songlian Li, Zhigang Tu, Yujin Chen, Tan Yu

Published: 01 Jan 2023, Last Modified: 11 Oct 2025CAAI Trans. Intell. Technol. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place. The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task. In this paper, we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images. To bridge the domain gap between these two view images, we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images. Then, the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information. Finally, we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars. Experiments on standard benchmark datasets show that our approach obtains 81.39% top-1 recall rate on the CVUSA dataset and 71.52% on the CVACT dataset, achieving the state-of-the-art performance and outperforming most of the existing methods significantly.

External IDs:dblp:journals/caaitrit/LiTCY23