DANet: A Dual-Branch Framework With Diffusion-Integrated Autoencoder for Infrared-Visible Image Fusion
Abstract: The fusion of infrared (IR) and visible images has become a hot research area by maximally combining the advantages of two sensors while avoiding the defects of a single sensor. However, the different spatial frequency information of IR and visible images is often neglected, and many methods cannot reconcile the extraction of both global information and local information, which can lead to less comprehensive feature extraction. To address the above issues, we introduce the diffusion model as a module to propose a new dual-branch DANet fusion network. Specifically, first, the encoder module consisting of the Transformer blocks and the invertible neural network (INN) block extracts detailed features. Meanwhile, the diffusion module extracts latent features, aiming to force the model to extract features from images at different scales through a step-by-step process of adding and removing noise. Finally, we fuse the obtained features as the decoder module’s input for fusing images. We test our proposed method on three datasets, TNO, RoadScene, and MSRS, using several evaluation metrics. Among them, our method achieves spatial frequency (SF) values of 13.224, 16.821, and 15.022 with gradient-based fusion performance (Qabf) values of 0.496, 0.471, and 0.723, respectively, and other metrics of our method also rank either first or second. As demonstrated by extensive experiments, DANet outperforms representative state-of-the-art methods in both qualitative and quantitative assessments. In addition, we conduct sufficient ablation experiments to validate the effectiveness of each module. We demonstrate that DANet enhances downstream IR-visible object detection performance on the M3FD dataset. Our code and fused results will be accessible at https://github.com/Pancy9476/DANet.
Loading