PororoGAN: An Improved Story Visualization Model on Pororo-SV Dataset

Gangyan Zeng, Zhaohui Li, Yuan Zhang

Published: 2019, Last Modified: 16 May 2023CSAI 2019Readers: Everyone

Abstract: Generating a sequence of images from a multi-sentence paragraph is a recently proposed task called Story-Visualization. In this task, how to keep the global consistency across dynamic scenes and characters in the story flow is the distinct difference from other single-image works, which is also a significant challenge. However, the visual quality and semantic relevance of existing results are not satisfying when handling datasets with high semantic complexity, such as Pororo-SV cartoon dataset. To address this issue, we propose a new story visualization model named PororoGAN, which jointly considers story-to-image-sequence, sentence-to-image and word-to-image-patch alignment. In particular, we introduce ASE (aligned sentence encoder) and AWE (attentional word encoder) to improve global and local relevance, respectively. Additionally, we add an image patches discriminator to improve the reality of results. Both quantitative and qualitative studies show that PororoGAN outperforms the state-of-the-art models.

0 Replies