ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts

Linfeng Tang; Yeda Wang; Zhanchuan Cai; Junjun Jiang; Jiayi Ma

ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts

Linfeng Tang, Yeda Wang, Zhanchuan Cai, Junjun Jiang, Jiayi Ma

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Image fusion, multimodal images, degradation

Abstract: Current image fusion methods struggle with real-world composite degradations and lack the flexibility to accommodate user-specific needs. To address this, we propose ControlFusion, a controllable fusion network guided by language-vision prompts that adaptively mitigates composite degradations. On the one hand, we construct a degraded imaging model based on physical mechanisms, such as the Retinex theory and atmospheric scattering principle, to simulate composite degradations and provide a data foundation for addressing realistic degradations. On the other hand, we devise a prompt-modulated restoration and fusion network that dynamically enhances features according to degradation prompts, enabling adaptability to varying degradation levels. To support user-specific preferences in visual quality, a text encoder is incorporated to embed user-defined degradation types and levels as degradation prompts. Moreover, a spatial-frequency collaborative visual adapter is designed to autonomously perceive degradations from source images, thereby reducing complete reliance on user instructions. Extensive experiments demonstrate that ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling, particularly under real-world and compound degradations.

Supplementary Material: zip

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 20382

Loading