Abstract: Cross-view geo-localization is to match scene images (e.g. ground-view images) with geo-tagged aerial images, which is crucial to a wide range of applications such as autonomous driving and street view navigation. Existing methods can neither address the perspective difference well nor effectively capture the scene information. In this work, we propose a Geometric and Scene Information Fusion (GSIF) model for more accurate cross-view geo-localization. GSIF first learns the geometric information of scene images and aerial images via log-polar transformation and spatial-attention aggregation to alleviate the perspective difference. Then, it mines the scene information of scene images via Sky View Factor (SVF) extraction. Finally, both geometric information and scene information are fused for image matching, and a balanced loss function is introduced to boost the matching accuracy. Experimental results on two real datasets show that our model can significantly outperforms the existing methods.
0 Replies
Loading