Abstract: Due to the lack of reference images for the training of infrared and visible image fusion (IVIF) network, the deep learning models cannot fuse the modal features of different source images well, resulting in fusion results that are biased toward one modality. This study proposes an IVIF method based on a dual-supervised mask generation network (DSMGN) that includes three parts: an encoder–decoder-based backbone network and two image-generation branches. In the backbone network, multiple residual dense involution blocks are constructed to extract the salient features of infrared images so an accurate mask image can be generated. The generated mask image is then used to define the fusion strategy to obtain the fusion results. To solve the problem of no reference images for network training, two image-generation branches are designed based on the gray-level cooccurrence matrix and Gaussian blur to generate two types of images that focus on different features of the source images. These two generated images are used to define a joint loss function to supervise the network training. Numerous experiments indicate that, compared with certain state-of-the-art methods, DSMGN achieves better fusion results in both subjective and objective aspects.
Loading