Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models

Kota Sueyoshi; Takashi Matsubara

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models

Kota Sueyoshi, Takashi Matsubara

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: diffusion models, text-based image generation, attention map, predicate logic

TL;DR: For text-based image generation, Predicated Diffusion translates text into guidance based on predicate logic, improving the faithfulness of generated images to the text.

Abstract: Diffusion models have achieved remarkable results in generating high-quality, diverse, and creative images. However, when it comes to text-based image generation, they often fail to capture the intended meaning presented in the text. For instance, a specified object may not be generated, an unnecessary object may be generated, and an adjective may alter objects it was not intended to modify. Moreover, we found that relationships indicating possession between objects are often overlooked. While users' intentions in the text are diverse, existing methods tend to specialize in only some aspects of these. In this paper, we propose Predicated Diffusion, a unified framework to express users' intentions. We consider that the root of the above issues lies in the text encoder, which often focuses only on individual words and neglects the logical relationships between them. The proposed method does not solely rely on the text encoder, but instead, represents the intended meaning in the text as propositions using predicate logic and treats the pixels in the attention maps as the fuzzy predicates. This enables us to obtain a differentiable loss function that makes the image fulfill the proposition by minimizing it. When compared to several existing methods, we demonstrated that Predicated Diffusion can generate images that are more faithful to various text prompts, as verified by human evaluators and pretrained image-text models.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3255

Loading