Large Scene Synthesis Controlled With Detailed Text Using View-wise Conditional Joint Diffusion With Hierarchical Spatial Controls

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Large-Scale Image Synthesis, Text-guided Image Generation, Diffusion Model
TL;DR: We propose text-guided large-scale image synthesis model that can generate seamless large images controlled by detailed text only
Abstract: Recently, text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional segmentation map with corresponding texts has greatly improved the controllability of large scene synthesis, adding more texts for large scene generation to faithfully reflect detailed text descriptions is challenging. Here, we propose DetText2Scene, a novel detailed-text-driven large-scale image synthesis with high faithfulness, high controllability with high naturalness in global context for the given descriptions. Our DetText2Scene consists of 1) a hierarchical keypoint-box layout conversion from the detailed text by leveraging large language model for spatial controls, 2) a view-wise conditioned joint diffusion process to synthesize a large scene from the given detailed text and the spatial controls in grounded hierarchical keypoint-box layout and 3) a pixel perturbation-based hierarchical enhancement to hierarchically refine it for global coherence. In experiments, our DetText2Scene significantly outperforms prior arts in text-to-image synthesis with the detailed text as well as our generated keypoint-box layouts qualitatively and quantitatively, achieving strong faithfulness with detailed descriptions, superior controllability, and excellent naturalness in global context in CLIP scores and/or user studies.
Supplementary Material: pdf
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4963
Loading