MoveAnything: Controllable Scene Generation with Text-to-Image Diffusion Models

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: controllable scene generation; text-to-image diffusion; spatial image editing
TL;DR: Generating scenes with rearrangable layouts using off-the-shelf text-to-image diffusion models
Abstract: Controllable scene generation, i.e., the task of generating images in which objects are at specific locations, is an active area of research with a wide range of applications in computer vision and graphics. Although Generative Adversarial Networks (GAN) have shown some successful results at this task by devising intermediate representations in which spatial content is disentangled, the quality of the generated images and the mid-level control they offer remains limited. Diffusion models, on the other hand, have been able to generate images with an unprecedented level of quality, but their generation process is hard to control and GAN-based techniques are not directly applicable to them. In this work, we propose SceneDiffusion, a framework that optimizes spatially disentangled representations for diffusion models. Our method jointly denoises multiple scene layouts during diffusion sampling, allowing controllable scene generation with any off-the-self text-to-image diffusion model. The proposed approach is training-free, has negligible time overhead, and is agnostic to any specific denoiser architecture. In addition, it further enables in-the-wild spatial image editing, allowing us to move any object in any given image while keeping the scene consistent. We build a comprehensive benchmark to quantitatively and qualitatively evaluate our approach and show that it outperforms previous works by a large margin on image quality and layout consistency.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4462
Loading