Abstract: Fusing complementary information in the visible-infrared image offers a promising approach to enhance the performance of downstream computer vision tasks (e.g., object detection, segmentation) in complicated imaging conditions (e.g., low illumination). However, due to the robust imaging capacity of the infrared sensor, most existing methods primarily rely on the salient object intensity information in the infrared modality, while the visible information (e.g., color, texture) is not adequately utilized, thus limiting their generalization capacity. In this study, we present a novel image fusion framework, i.e., MCInet, which attempts to Maximize and merge the Complementary Information across visible-infrared modalities for more informative image fusion. To this end, we first introduce the modality-specific processing module to improve the information representation of each modality. For visible images, a pretrained low-light enhance module is adopted to enhance its color and texture. For infrared images, a nonlinear mapping module is constructed to suppress excessive salient object intensity. Then we establish a reusable MCI block that embeds a crossimage mutual information minimization scheme into an input-aware fusion module. This empowers us to dynamically maximize and merge complementary information between two input images according to their feature representation. In addition, we introduce a cycle reconstruction loss to self-supervised regularize the fusion results. Experiments on image fusion, object detection, and segmentation demonstrate that the proposed framework can produce more informative fusion results and exhibit better performance in downstream tasks.
External IDs:dblp:journals/tmm/NieWWZZ25
Loading