Bridging The Domain Gap Arising from Text Description Differences for Stable Text-To-Image Generation

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Generating high-quality images that conform to the semantics of captions has numerous potential applications. However, text-to-image generation is a challenging task due to its cross-modality nature. Current generative models are typically unstable, meaning that complex sentences can result in poor image quality. In this paper, we propose a novel model to bridge the domain gap arising from sentence complexity to achieve stable text-to-image generation. Our model includes two key modules, the attribute extraction module and the attribute fusion module. These modules can extract attributes from the captions and fuse them with image features to encourage the model to accurately understand the semantics. Our modules are plug-and-play and extensive experiments demonstrate that our approach outperforms the state-of-the-art GAN model. Our code and trained model are available at https://github.com/tantian21/stable-t2i-generation.
Loading