P+: Extended Textual Conditioning in Text-to-Image Generation

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Text-to-image, Textual Inversion, Diffusion, Image Generation
TL;DR: We introduce an Extended Textual Conditioning space in text-to-image diffusion models that conducts finer control over different generation aspects.
Abstract: We introduce an Extended Textual Conditioning space in text-to-image diffusion models, referred to as P+. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a cross-attention layer of the denoising U-net of the diffusion model. We show that the extended space provides greater control over the synthesis process. We further introduce Extended Textual Inversion (XTI), which inverts concepts into P+, such that they are represented with per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. Compared to baselines, XTI achieves much better reconstruction and editability without the need to balance these two goals. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2471
Loading