Aerial-view geo-localization based on multi-layer local pattern cross-attention network

Published: 01 Jan 2024, Last Modified: 09 Nov 2024Appl. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Aerial-view geo-localization aims to determine locations of interest to drones by matching drone-view images against a satellite database with geo-tagging. The key underpinning of this task is to mine discriminative features to form a view-invariant representation of the same target location. To achieve this purpose, existing methods usually focus on extracting fine-grained information from the final feature map while neglecting the importance of middle-layer outputs. In this work, we propose a Transformer-based network, named Multi-layer Local Pattern Cross Attention Network (MLPCAN). Particularly, we employ the cross-attention block (CAB) to establish correlations between information of feature maps from different layers when images are fed into the network. Then, we apply the square-ring partition strategy to divide feature maps from different layers and acquire multiple local pattern blocks. For the information misalignment within multi-layer features, we propose the multi-layer aggregation block (MAB) to aggregate the high-association feature blocks obtained by the division. Extensive experiments on two public datasets, i.e., University-1652 and SUES-200, show that the proposed model significantly improves the accuracy of geo-localization and achieves competitive results.
Loading