Multi-modality Image Fusion under Adverse Weather: Mask-Guided Feature Restoration and Interaction

ICLR 2026 Conference Submission313 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-modality Image Fusion, Complex Scenes, Feature Interaction
Abstract: Multi-modality image fusion (MMIF) enhances scene representation by exploiting complementary cues from different modalities. Adverse weather, however, causes significant image degradation, disrupting feature representation and requiring simultaneous feature restoration and cross-modal complementarity. Existing methods often struggle with effective representation learning under such conditions, limiting their practical performance. To address these challenges, we propose a mask-guided MMIF method that integrates feature restoration and interaction. We first introduce "Pseudo Ground Truth" to simplify training, promoting faster and more effective feature learning. Then, we design a mask generation mechanism based on the mapping relationship between the fused result and the source images, quantifying the relative contribution of each modality during the fusion process. By incorporating the proposed mask-guided cross-modal cross-attention mechanism, the network is encouraged to selectively attend to informative features during modality interaction, mitigating the risk of overfitting to the static distribution of the "Pseudo Ground Truth". Additionally, we propose a mask-guided and a task-coupled degradation-aware strategy to balance feature restoration and interaction. Extensive experiments on synthetic and real-world datasets (rain, haze, and snow) demonstrate that our method surpasses state-of-the-art approaches in visual quality, quantitative metrics, and downstream tasks.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 313
Loading