Abstract: Modern infrared (IR) technology has been proven highly significant in remote sensing imagery (RSI). Currently, multimodal RSI object detection based on red-green–blue (RGB)-IR image pairs has attracted widespread research. However, capturing features in the IR domain poses a challenge, as existing object detectors heavily focus on chromatic information in the RGB domain. Furthermore, the quality of RGB images can be influenced by complex environmental conditions, limiting the practicality of multimodal detection. In this letter, we introduce cross-modal-you only look once (CM-YOLO), a lightweight yet effective object detector specifically designed for IR remote sensing images. CM-YOLO employs cross-modal adaptation to enhance the awareness of IR-RGB modality translation. Specifically, we leverage a prior modality translator (PMT) to learn the infrared-visible (IV) features, which are incorporated into the detection backbone using our IV-gate modules. Experimental results on the VEDAI dataset demonstrate that CM-YOLO significantly outperforms conventional methods. Moreover, CM-YOLO exhibits a strong generalization ability for IR-based object detection in urban scenes on the FLIR dataset.
Loading