Keywords: text-to-image, generative adversarial networks, self-attention, semantic consistency
TL;DR: This paper propose a novel GAN-based model called Dice-GAN with diversity injection and consistency enhancement.
Abstract: In the field of natural language description tasks, one challenge for text-to-image modeling is to generate images that are both of high quality and diversity and maintain a high degree of semantic consistency with the textual description. Although significant progress has been made in existing research, there is still potential for improving image quality and diversity. In this study, we propose an efficient attention-based text-to-image synthesis model based on generative adversarial network named Dice-GAN. To enhance the diversity of image generation, we design a diversity injection module, which injects noise several times during the image generation process, fuses the noise with the textual information, and incorporates a self-attention mechanism to help the generator maintain global structural consistency while enhancing the diversity of the generated image. To improve the semantic consistency, we designed a consistency enhancement module, which enhances the semantic consistency of image generation by combining word vectors and a hybrid attention mechanism to achieve dynamic weight adjustment for different image regions. We conducted experiments on two widely used benchmark datasets, CUB and COCO. Dice-GAN demonstrated significant superiority in improving the fidelity and diversity of image generation compared to the existing approaches.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4454
Loading