Keywords: test time scaling, image generation, image editing
Abstract: Test-time scaling has emerged as an effective strategy to enhance image generation quality by repeatedly generating multiple images and selecting optimal outputs. However, such best-of-N schemes essentially rely on blind resampling with different random seeds, lacking the ability to incrementally refine errors based on previously correct generations. Some improved approaches rely on external verifiers to identify textual errors and feed them back to the model for refinement. However, they do not support targeted modifications with image consistency, and introduce further computational overhead. In this work, to address these limitations, we propose Self-Correction at Test-time (SCoT), a novel framework that equips generative models with internal self-assessment and targeted revision capabilities. Specifically, SCoT is trained to preserve the correctly generated regions while autonomously modifying only erroneous parts, eliminating the need for external guidance. This self-reflective mechanism enhances visual consistency, and unlocks the model’s potential capacity for prompt-guided correction. SCoT improves over the baseline by up to 0.25, substantially surpassing prior methods, providing a more reliable, efficient, and user-aligned approach to high-quality image generation.
Primary Area: generative models
Submission Number: 7821
Loading