Training-Free Location-Aware Text-to-Image Synthesis

Published: 01 Jan 2023, Last Modified: 31 Oct 2024ICIP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study 1 , we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.
Loading