Structure-Aware Generative Adversarial Network for Text-to-Image Generation

Published: 01 Jan 2023, Last Modified: 13 Nov 2024ICIP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text-to-image generation aims at synthesizing photo-realistic images from textual descriptions. Existing methods typically align images with the corresponding texts in a joint semantic space. However, the presence of the modality gap in the joint semantic space leads to misalignment. Meanwhile, the limited receptive field of the convolutional neural network leads to structural distortions of generated images. In this work, a structure-aware generative adversarial network (SaGAN) is proposed for (1) semantically aligning multimodal features in the joint semantic space in a learnable manner; and (2) improving the structure and contour of generated images by the designed content-invariant negative samples. Experimental results show that SaGAN achieves over 30.1% and 8.2% improvements in terms of FID on the datasets of CUB and COCO when compared with the state-of-the-art approaches.
Loading