Early-Retry Regeneration Framework for Improving Object-Sufficiency in Text-to-Image Generation

Shogo Ishii, Tomoaki Yamazaki, Kengo Murata, Seiya Ito, Kouzou Ohara

Published: 2026, Last Modified: 30 Apr 2026VISAPP (1) 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Text-to-Image generation is a task of generating an image corresponding to a given textual description. Although existing Text-to-Image models can generate images with high visual fidelity, they do not always render all objects specified in the input text due to both text–image inconsistencies and the inherently probabilistic nature of the models. Consequently, users often need to repeatedly regenerate images until obtaining one that satisfies the text, resulting in additional computational cost and user inconvenience. To address this issue, we propose an early-retry regeneration framework that performs intermediate retries based on object-sufficiency judgements, reducing regeneration cost while improving object coverage. Object-sufficiency judgements are implemented in two ways: using an external model applied to rendered images, or leveraging internal representations of the generative model without requiring rendering. Experiments demonstrate that the proposed framework, using eith

External IDs:dblp:conf/visapp/IshiiYMIO26