CoT-RC: Chain-of-Thought Reflection and Correction for Image Generation without Extra Training

ICLR 2026 Conference Submission15903 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chain-of-Thought (CoT), Image Generation, Unified Multimodal Models (UMMs), Training-free Framework
TL;DR: We propose a training-free CoT-enhanced framework that unifies global semantic correction and local refinement within Unified Multimodal Models, yielding substantial gains in image generation consistency.
Abstract: Recent studies have explored integrating Chain-of-Thought (CoT) reasoning into image generation to improve accuracy and controllability. However, existing methods either rely on costly training, separate reasoning from generation, or lack fine-grained visual error correction. We propose a training-free CoT-enhanced image generation framework that leverages the semantic understanding and positional awareness of Unified Multimodal Models (UMMs). Our method introduces a CoT-guided Reflection Module for image-level global correction and a semantic-driven token-level local correction module for fine-grained refinement, forming a dynamic reasoning loop with iterative triggers and backtracking. Experiments demonstrate that our approach improves the Show-o baseline from 68\% to 78\% on GenEval and achieves a 14\% gain on T2I-CompBench, outperforming prior CoT-based methods under the same baseline, including reinforcement learning-based approaches. Our framework is entirely training-free, efficient, and establishes a new paradigm for CoT in image generation.
Primary Area: generative models
Submission Number: 15903
Loading