COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization

Published: 01 Jan 2023, Last Modified: 13 Nov 2024MMM (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Language-based image generation is a challenging task. Current studies normally employ conditional generative adversarial network (cGAN) as the model framework and have achieved significant progress. Nonetheless, a close examination of their methods reveals two fundamental issues. First, the discrete linguistic conditions make the training of cGAN extremely difficult and impair the generalization performance of cGAN. Second, the conditional discriminator cannot extract semantically consistent features based on linguistic conditions, which is not conducive to conditional discrimination. To address these issues, we propose a condition optimization and mutual information maximization GAN (COMIM-GAN). To be specific, we design (1) a text condition construction module, which can construct a compact linguistic condition space, and (2) a mutual information loss between images and linguistic conditions to motivate the discriminator to extract more features associated with the linguistic conditions. Extensive experiments on CUB-200 and MS-COCO datasets demonstrate that our method is superior to the existing methods.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview