ARTIST: Towards Disentangled Text Painter with Diffusion Models

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Diffusion Models, Image generation
Abstract: Diffusion models have shown remarkable performance in generating a broad spectrum of visual content. However, their text rendering ability is still limited: they generate wrong characters or words that cannot blend well with the background image. To address this, we introduce a novel framework named ARTIST, which includes an additional textual diffusion model focusing on text structure learning. We first pretrain the textual diffusion model. Then we further fine-tune the visual model to learn how to inject textual structure information from the frozen textual model into the image. This disentangled architecture design and training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation. Furthermore, we leverage pre-trained large-language models to infer the user's intention leading to better generation quality. Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15\% in various metrics.
Supplementary Material: pdf
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5802
Loading