CLED-Fusion: Controllable and Latent-Explainable Diffusion for Multi-Degradation Multi-Modal Image Fusion
Keywords: Multi-modal learning; Image fusion; Diffusion model
Abstract: Multi-modal image fusion aims to combine complementary information from different modalities, yet its deployment is hindered by diverse degradations (\eg low-light, blur, haze, noise). Existing methods mainly focus on feature integration, lacking controllability over degradation removal and explainability of the generative process. We propose a novel \textbf{C}ontrollable and \textbf{L}atent-\textbf{E}xplainable \textbf{D}iffusion framework for multi-degradation \textbf{Fusion} (\textbf{CLED-Fusion}). CLED-Fusion introduces shared distribution priors to unify heterogeneous degradations into a consistent latent space, enabling controllable regulation of removal strength and cross-modal balance. Diffusion dynamics are reformulated into a dual process, where a deterministic residual pathway removes degradations and a stochastic noise pathway preserves fine details, yielding an interpretable trajectory. An explicit degrade-fusion module embeds these priors directly into degraded inputs, avoiding redundant reconstruction and ensuring efficiency. Extensive experiments on multiple benchmarks show that CLED-Fusion achieves superior fusion quality, robustness to degradations, and strong adaptability in medical imaging scenarios.
The code is available at \url{https://anonymous.4open.science/r/CLED-Fusion-D88C/}.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 10024
Loading