GLAD: A Global-Attention-Based Diffusion Model for Infrared and Visible Image Fusion

Haozhe Guo, Mengjie Chen, Kaijiang Li, Hao Su, Pei Lv

Published: 2024, Last Modified: 13 Feb 2025ICIC (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Infrared and visible image fusion (IVIF) is a widely used approach to enhance scenario understanding, which fuses the salience of infrared images and the texture details of visible images. Existing methods typically focus on extracting local feature maps between connected layers while ignoring the global features, which incurs the issue of fine-grained loss (e.g., texture and edge blurring) in the fused images. To address the issue, we propose GLAD (GLobal-Attention-based Diffusion model), a novel IVIF approach to produce high-quality fused images with fine-grained. In GLAD, we first tailor a denoising network of the diffusion model to learn the joint distribution of multi-channel data. Next, we proposed a global attention fusion module to synthesize the global features extracted from the denoising network into a fine-grained fused image. Moreover, considering the influences of illumination factors, we design a fusion loss function to improve the denoising network for IVIF task. Qualitative and quantitative experiments demonstrate that our GLAD is 7.08% better than other state-of-the-art methods on the MSRS dataset.