Alignment-assisted Frequency Fusion Network for RGB-infrared vehicle detection

Published: 2025, Last Modified: 05 Nov 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Fusion complementary information from RGB and infrared (IR) modalities enables CNN-based detectors to learn a more robust vehicle representation, thus enhancing detection performance. However, the inherent modality gap between RGB and IR hinders effective feature fusion and limits detection performance. In this work, we decouple the RGB-IR vehicle detection task into two sub-sequential tasks: mitigate the modality gap, adaptive fusion of multimodal features. Specifically, we introduce a novel Alignment-assisted Frequency Fusion vehicle detection Network (AFFNet), which comprises two innovative blocks: the Detail-aware Semantic Alignment Block (DSAB) and the Adaptive Frequency domain Fusion Block (AFFB). The DSAB alleviates the modality gap and improves the semantic consistency by exploring the feature correspondence through explicit bidirectional multimodal feature alignment. Furthermore, the AFFB decouples key vehicle features with 2D Discrete Fourier Transform (DCT) in the frequency domain and leverages the proposed fusion units to adaptively integrate fine-grained, frequency-domain vehicle features from different modalities, thereby enhancing the quality of the fusion representations. We conducted experiments on the VEDAI and DroneVehicle multimodal vehicle datasets, achieving improvements of 4% and 1.2% on mAP50 over the comparison methods, respectively. Both the quantitative and qualitative results show the effectiveness of our method.
Loading