Dif-GAN: A Generative Adversarial Network with Multi-Scale Attention and Diffusion Models for Infrared-Visible Image Fusion

Chengyi Pan, Xiuliang Xi, Xin Jin, Huangqimei Zheng, Puming Wang, Qiang Jiang

Published: 2024, Last Modified: 13 May 2025ISPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: To obtain fused images with rich information, visible and infrared images are combined. Most current fusion techniques provide decent results. However, they have shortcomings in extracting information of the source images. This limitation prevents the fused images from adequately considering thermal radiation regions and texture details. As a result, the detailed texture information of the source visible image in the final fusion image is much more than the thermal target information of the source infrared image, or vice versa. Since features at a single scale fail to adequately capture the spatial details of complex scenes, a multi-scale attention network is used to extract the deep feature information of source images. For latent variable issues, the Expectation Maximization (EM) technique can yield maximum likelihood estimates. This not only stabilizes the training of the Generative Adversarial Network (GAN) but also aids in addressing the issue of labels lacking in the fusion of visible and infrared images. Although the EM algorithm framework can greatly enhance the training stability of GAN models, the improvement in fusion quality is not large. Therefore, a diffusion model is introduced into the generator to capture the potential joint structure information between infrared and visible images. Massive experiments show that Dif-GAN outperforms the state-of-the-art.