MulFS-CAP: Multimodal Fusion-Supervised Cross-Modality Alignment Perception for Unregistered Infrared-Visible Image Fusion
Abstract: In this study, we propose Multimodal Fusion-supervised Cross-modality Alignment Perception (MulFS-CAP), a novel framework for single-stage fusion of unregistered infrared-visible images. Traditional two-stage methods depend on explicit registration algorithms to align source images spatially, often adding complexity. In contrast, MulFS-CAP seamlessly blends implicit registration with fusion, simplifying the process and enhancing suitability for practical applications. MulFS-CAP utilizes a shared shallow feature encoder to merge unregistered infrared-visible images in a single stage. To address the specific requirements of feature-level alignment and fusion, we develop a consistent feature learning approach via a learnable modality dictionary. This dictionary provides complementary information for unimodal features, thereby maintaining consistency between individual and fused multimodal features. As a result, MulFS-CAP effectively reduces the impact of modality variance on cross-modality feature alignment, allowing for simultaneous registration and fusion. Additionally, in MulFS-CAP, we advance a novel cross-modality alignment approach, creating a correlation matrix to detail pixel relationships between source images. This matrix aids in aligning features across infrared and visible images, further refining the fusion process. The above designs make MulFS-CAP more lightweight, effective and explicit registration-free. Experimental results from different datasets demonstrate the effectiveness of our proposed method and its superiority over the state-of-the-art two-stage methods.
External IDs:dblp:journals/pami/LiYZJYL25
Loading