FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion

Xiaoyang Zhang; Chengpei Xu; Guodong Fan; Zhen Hua; Jinjiang Li; Jingchun Zhou

FSCMF: A Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network for visible and infrared image fusion

Xiaoyang Zhang, Chengpei Xu, Guodong Fan, Zhen Hua, Jinjiang Li, Jingchun Zhou

Published: 01 Jan 2025, Last Modified: 01 Aug 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing image fusion methods face limitations in deep feature modeling and high-frequency information enhancement, leading to detail loss and reduced target saliency in complex scenarios. To address these issues, this paper proposes a Dual-Branch Frequency-Spatial Joint Perception Cross-Modality Network (FSCMF), which integrates local details, global context, and frequency-domain information through a dual-branch architecture to enhance multimodal feature complementarity. Specifically, FSCMF combines CNN and Transformer in a dual-branch design, where the CNN branch focuses on extracting local structures and texture details, while the Transformer branch captures long-range dependencies to improve global consistency. To further optimize feature representation, we introduce the Frequency-Spatial Adaptive Attention Module (FSAA), in which the frequency domain branch enhances high-frequency components to improve edge sharpness, while the spatial domain branch adaptively refines salient region features, ensuring a dynamic balance between global and local information. Additionally, we propose the Weighted Cross-Spectral Feature Fusion Module (WCSFF) to enhance cross-modality feature interaction through adaptive weighting, thereby improving detail integrity and semantic consistency in the fused image. A maximum frequency loss function is further incorporated to ensure the preservation of critical frequency components. Extensive experiments on three public datasets — MSRS, M3FD, and LLVIP — demonstrate that FSCMF outperforms existing methods in both qualitative and quantitative evaluations, producing fusion results with higher visual consistency and better information retention. Furthermore, additional experiments on object detection and semantic segmentation validate FSCMF’s potential in high-level computer vision tasks, highlighting its broad application value. The code of FSCMF is available at https://github.com/boshizhang123/FSCMF.

Loading