Abstract: Highlights•We employ an efficient single-stage GAN structure that has lower parameters and faster inference speed.•Novel Context-Aware Text-Image Block improves vision-language semantic consistency for text-to-image synthesis.•Innovative Attention Convolution Module enriches the diversity and quality of synthesized images.•The mixed self-attention and convolution facilitates the understanding of complex images, improving language-vision matching.
Loading