Abstract: Text-to-Image generation is a task of generating an image corresponding to a given textual description. Although existing Text-to-Image models can generate images with high visual fidelity, they do not always render all objects specified in the input text due to both text–image inconsistencies and the inherently probabilistic nature of the models. Consequently, users often need to repeatedly regenerate images until obtaining one that satisfies the text, resulting in additional computational cost and user inconvenience. To address this issue, we propose an early-retry regeneration framework that performs intermediate retries based on object-sufficiency judgements, reducing regeneration cost while improving object coverage. Object-sufficiency judgements are implemented in two ways: using an external model applied to rendered images, or leveraging internal representations of the generative model without requiring rendering. Experiments demonstrate that the proposed framework, using eith
External IDs:dblp:conf/visapp/IshiiYMIO26
Loading