CLED-Fusion: Controllable and Latent-Explainable Diffusion for Multi-Degradation Multi-Modal Image Fusion

Xingfeng Li; Dongliang Wang; Yufeng Chen; Yuan Sun; Zhenwen Ren

CLED-Fusion: Controllable and Latent-Explainable Diffusion for Multi-Degradation Multi-Modal Image Fusion

Xingfeng Li, Dongliang Wang, Yufeng Chen, Yuan Sun, Zhenwen Ren

18 Sept 2025 (modified: 18 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-modal learning; Image fusion; Diffusion model

Abstract: Multi-modal image fusion aims to combine complementary information from different modalities, yet its deployment is hindered by diverse degradations (\eg low-light, blur, haze, noise). Existing methods mainly focus on feature integration, lacking controllability over degradation removal and explainability of the generative process. We propose a novel \textbf{C}ontrollable and \textbf{L}atent-\textbf{E}xplainable \textbf{D}iffusion framework for multi-degradation \textbf{Fusion} (\textbf{CLED-Fusion}). CLED-Fusion introduces shared distribution priors to unify heterogeneous degradations into a consistent latent space, enabling controllable regulation of removal strength and cross-modal balance. Diffusion dynamics are reformulated into a dual process, where a deterministic residual pathway removes degradations and a stochastic noise pathway preserves fine details, yielding an interpretable trajectory. An explicit degrade-fusion module embeds these priors directly into degraded inputs, avoiding redundant reconstruction and ensuring efficiency. Extensive experiments on multiple benchmarks show that CLED-Fusion achieves superior fusion quality, robustness to degradations, and strong adaptability in medical imaging scenarios. The code is available at \url{https://anonymous.4open.science/r/CLED-Fusion-D88C/}.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 10024

Loading