TL;DR: A task-gated multi-expert collaboration network for all-in-one multi-modal image restoration and fusion.
Abstract: Multi-modal image fusion aims to integrate complementary information from different modalities to enhance perceptual capabilities in applications such as rescue and security. However, real-world imaging often suffers from degradation issues, such as noise, blur, and haze in visible imaging, as well as stripe noise in infrared imaging, which significantly degrades model performance. To address these challenges, we propose a task-gated multi-expert collaboration network (TG-ECNet) for degraded multi-modal image fusion. The core of our model lies in the task-aware gating and multi-expert collaborative framework, where the task-aware gating operates in two stages: degradation-aware gating dynamically allocates expert groups for restoration based on degradation types, and fusion-aware gating guides feature integration across modalities to balance information retention between fusion and restoration tasks. To achieve this, we design a two-stage training strategy that unifies the learning of restoration and fusion tasks. This strategy resolves the inherent conflict in information processing between the two tasks, enabling all-in-one multi-modal image restoration and fusion. Experimental results demonstrate that TG-ECNet significantly enhances fusion performance under diverse complex degradation conditions and improves robustness in downstream applications. The code is available at https://github.com/LeeX54946/TG-ECNet.
Lay Summary: This paper proposes a unified framework for degraded multi-modal image restoration and fusion, which bridges different tasks together through a two-stage training strategy to learn inter-task information while avoiding mutual interference, enabling all-in-one processing.
This paper proposes the task-aware gating and multi-expert collaboration module. The degradation-aware gating adapts to different degradation types and selects the optimal expert group for image restoration; while the fusion-aware gating dynamically balances the information retention between fusion and restoration tasks to achieve better fusion performance.
This paper constructs a large-scale degraded multi-modal image fusion benchmark, DeMMI-RF, which contains more than $30,000$ multi-modal data of different degradation types, including those from UAVs and driving viewpoints. Results on multiple datasets validate the superior performance of the model in complex degraded scenarios and robustness for downstream applications.
Link To Code: https://github.com/LeeX54946/TG-ECNet
Primary Area: Applications->Computer Vision
Keywords: Degraded Multi-modal Image Fusion, Task-aware Gating, Multi-expert Collaboration
Submission Number: 3251
Loading