Towards photorealistic face generation using text-guided Semantic-Spatial FaceGAN

Qi Guo, Xiaodong Gu

Published: 2025, Last Modified: 25 Jan 2026Multim. Tools Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we propose a simple yet effective Text-To-Face (T2F) generative adversarial network named Semantic-Spatial FaceGAN, which addresses the challenge of generating facial images from natural language descriptions. Natural language is inherently abstract, whereas images are concrete. This discrepancy poses a significant challenge, especially when utilizing multiple descriptions to generate accurate images. To overcome this issue, we introduce the Semantic Spatial FaceGAN (SS-FaceGAN) network, capable of generating precise features from multiple descriptions. Additionally, we incorporate a novel Focus Spatial (FS) module that predicts masks based on text semantics to refine image feature mapping. We also introduce an attention mechanism, the Word Attention Reuse (WAR) module, which leverages the potential distribution of each word in the description to compute word-level attention. Finally, our experiments demonstrate the effectiveness of our approach.

External IDs:dblp:journals/mta/GuoG25