M-COLOR: MLLM-GUIDED DIFFUSION MODELS FOR IMAGE COLORIZATION

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: language-based image colorization, diffusion models
TL;DR: The paper proposes M-Color, a diffusion-based method that uses multimodal language models and luminance cues for semantically aligned, structurally consistent image colorization.
Abstract: Language-based Image colorization transforms grayscale images into vivid, visually pleasing colorized outputs with semantic guidance. Existing methods often rely on CLIP text embeddings, which may struggle with deep semantic understanding, leading to suboptimal colorization. In this paper, we propose M-Color, a novel diffusion-based framework that leverages multimodal large language models (MLLMs) to enhance language comprehension through an Adaptive Decoding strategy. To maintain structural consistency, we introduce a Luminance-Aware Encoder (LAE) that aligns grayscale images with the colorized output and a Luminance Extraction Module (LEM) to integrate luminance information into the latent generation process. Extensive experiments demonstrate that M-Color achieves superior semantic alignment, improves structural consistency, and outperforms state-of-the-art methods in both quantitative and qualitative evaluations.
Primary Area: generative models
Supplementary Material: pdf
Submission Number: 4452
Loading