Dual-Perspective Alignment Learning for Multimodal Remote Sensing Object Detection

Yanfeng Liu; Wei Guo; Chaojun Yao; Lefei Zhang

Dual-Perspective Alignment Learning for Multimodal Remote Sensing Object Detection

Yanfeng Liu, Wei Guo, Chaojun Yao, Lefei Zhang

Published: 01 Jan 2025, Last Modified: 04 Jul 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, anchor-based detectors can achieve decent performance in multimodal remote sensing scenarios, whereas their anchor-free counterparts fail to reach comparable results. To remedy this problem, we first comprehensively investigate the misalignment issues in multimodal features and detection heads and present a dual-perspective alignment learning (DPAL) framework for multimodal remote sensing object detection. Particularly, we design a cross-modal alignment module (CMAM), which utilizes the multiscale dilation strategy and a differentiable alignment function with channel-wise modulation for cross-modal feature integration. Additionally, to cope with the misalignment problem in regression and classification heads, we propose a task-head alignment module (THAM). It presents a novel pseudo-anchor mechanism, introduces a semi-fixed offset generation strategy to capture task-variant sampling coordinates, and ultimately deploys an offset knowledge transfer mechanism with deformable alignment for anchor-free detection heads. Extensive experiments on four multimodal object detection datasets show impressive results of the proposed DPAL framework. The project code is released at https://github.com/lyf0801/DPAL

Loading