Keywords: Diffusion model, Underwater image enhancement, VLM
Abstract: Underwater images often suffer from severe color distortion and texture degradation due to light absorption and scattering, posing huge challenges for visual perception and restoration. Recent diffusion-based underwater image enhancement (UIE) methods have shown remarkable performance, but most rely on customized architectures trained from scratch or lack auxiliary guidance beyond image-level inputs, which limit the model generalization and controllability. In this work, we propose a semantic Color reasoning and high-fidelity Detail synthesis UIE framework (CoDe), which fully leverages the synergy of diffusion models and vision-language models. It explicitly disentangles color and texture of underwater images: a fine-tuned LLaVA provides domain-invariant semantic color cues for robust color correction, while an SDXL-based generator restores high-frequency details for sharp reconstruction. Furthermore, we design an adaptive degradation-aware feature modulation module that fuses underwater and clean-domain representations, effectively suppressing noise interference during the denoising diffusion process. Extensive experiments on multiple underwater benchmarks demonstrate that CoDe achieves superior performance, significantly improving both color fidelity and texture preservation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5440
Loading