Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

16 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: diffusion model;dreambooth;personalization
Abstract: This paper presents a novel approach for creating customized images of objects as per user specifications. Unlike previous methods that often involve time-consuming optimizations, typically following a per-object optimization approach, our method is built upon a comprehensive framework designed to expedite the process. Our framework employs an encoder to capture the essential high-level characteristics of objects, generating an object-specific embedding through a single feed-forward pass. This acquired object embedding is subsequently utilized by a text-to-image synthesis model for image generation. To seamlessly integrate the object-aware embedding space into a well-established text-to-image model within the same generation context, we explore various network architectures and training strategies. Furthermore, we introduce a straightforward yet highly effective regularized joint training approach that incorporates an object identity preservation loss. In addition to this, we propose a caption generation scheme that plays a crucial role in ensuring the faithful representation of object-specific embeddings throughout the image generation process. This approach enables users to maintain control over the process and provides them with editing capabilities.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 522
Loading