DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multi-modality image fusion (MMIF) aims to integrate the complementary features of source images into the fused image, including target saliency and texture specifics. Recently, image fusion methods leveraging diffusion models have demonstrated commendable results. Despite their strengths, diffusion models reduce the capability to perceive local features. Additionally, their inherent working mechanism, introducing noise to the inputs, consequently leads to a loss of original information. To overcome this problem, we propose a novel Diffusion-CNN feature Aggregation Fusion (DCAFuse) network that can extract complementary features from the dual branches and aggregate them effectively. Specifically, we utilize the denoising diffusion probabilistic model (DDPM) in the diffusion-based branch to construct global information, and multi-scale convolutional kernels in the CNN-based branch to extract local detailed features. Afterward, we design a novel complementary feature aggregation module (CFAM). By constructing coordinate attention maps for the concatenated features, CFAM captures long-range dependencies in both horizontal and vertical directions, thereby dynamically guiding the aggregation weights of branches. In addition, to further improve the complementarity of dual-branch features, we introduce a novel loss function based on cosine similarity and a unique denoising timestep selection strategy. Extensive experimental results show that our proposed DCAFuse outperforms other state-of-the-art methods in multiple image fusion tasks, including infrared and visible image fusion (IVF) and medical image fusion (MIF). The source code will be publicly available at https://xxx/xxx/xxx.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: We propose Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network (DCAFuse), which offers a substantial contribution to Multimodal Fusion area in multimedia/multimodal processing. Specifically, DCAFuse integrates the complementary features, including target saliency and texture features, of different source images into the fused image effectively. In both infrared and visible image fusion (IVF) and medical image fusion (MIF) tasks, DCAFuse achieves leading performance compared with SOTA methods. IVF has been widely utliized in autonomous driving, drone nighttime monitoring, video surveillance, etc. And MIF is able to assist doctors in medical diagnosis and treatment, which becomes an indispensable part of modern medical imaging. Our proposed method handles these tasks effectively and inspires future work.Therefore, our work will advance the multimodal fusion further.
Supplementary Material: zip
Submission Number: 4196
Loading