DWSF-Net: A Dynamic Wavelet-based Spatial-frequency Fusion Network for Multispectral Object Detection
Abstract: Multispectral object detection aims to identify tar gets under diverse illumination conditions by leveraging complementary information from multiple spectral modalities. A major challenge in this field lies in effectively fusing multispectral features while accounting for both spatial and frequency domain characteristics. Existing methods primarily focus on spatial fusion, often neglecting critical frequency domain cues and treating all spectral channels equally, despite their distinct properties. In particular, RGB images capture high-frequency texture and color, whereas infrared (IR) images focus on low frequency thermal signatures—rendering conventional spatial only fusion suboptimal. To address these challenges, we propose a novel Dynamic Wavelet-based Spatial-Frequency Fusion Net work (DWSF-Net) that integrates both spatial and frequency information for enhanced multispectral representation. DWSF Net introduces a learnable wavelet encoder to adaptively extract frequency-aware features, a wavelet modulation fusion module to selectively combine informative sub-bands across spectra, and a frequency-domain sub-band fusion scheme with adaptive weight learning to refine cross-spectral integration. Finally, modulated spatial features and adaptively fused frequency components are aggregated to form the final representation. Extensive experiments conducted on three public datasets demonstrate that the proposed DWSF-Net achieves state-of-the-art performance, highlighting its effectiveness and potential for improving the accuracy of multispectral object detection.
Loading