Keywords: Diffusion models, score-based models, generative models, personalization
TL;DR: We propose a method to utilize the pre-trained text-to-image diffusion models to generate a custom dataset.
Abstract: Recently, several large-scale text-to-image diffusion models have been released, showing unprecedented performance. Since the shift from learning a task-specific model from scratch to leveraging pre-trained large-scale models is an inevitable trend in deep generative modeling, it is necessary to develop methods to better utilize these models. In this paper, we propose a method dubbed Diffusion model for Your Own Data (DYOD) that can effectively utilize a pre-trained text-to-image diffusion model to approximate the implicit distribution of a custom dataset. Specifically, we first obtain a text prompt that can best represent the custom dataset through optimization in the semantic latent space of the diffusion model. In order to be able to better control generative image content, in particular geometry of the objects, we show that the text prompt alone is not sufficient, but rather an informative initialization that can guide the pre-trained diffusion model is necessary. As representative examples, we demonstrate that learned distribution initialization from user's data set or an image initialization by user's sketch, photo, etc. serves the goal for customizing diffusion model for user's own data. Experiments show that the customized DYOD outperforms the Stable Diffusion baselines both qualitatively and quantitatively with accelerated sampling speed.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Generative models
5 Replies
Loading