Abstract: Image-based cross-view localization often yields imprecise camera pose estimations due to the limited sampling density in the satellite image database. Current cross-view camera pose estimation approaches often overlook the importance of multi-scale features. In this paper, we introduce a novel coarse-to-fine cross-view camera pose estimation method (C2F-CCPE) that leverages multi-scale feature fusion and a localization and orientation feature fusion module (LOFFM) to enhance performance in localization and directional prediction. The proposed method outperforms state-of-the-art approaches on two benchmark datasets, addressing limitations in existing methods that focus on single-scale feature representations. And C2F-CCPE captures global and local information simultaneously, improving robustness against occlusions and enhancing precision in complex scenes. LOFFM further aggregates directional and positional information, enabling the network to deeply comprehend location features. Quantitative and qualitative experiments on two datasets not only demonstrate that our proposed method outperforms state-of-the-art models but also verify the effectiveness of our proposed method.
Loading