PororoGAN: An Improved Story Visualization Model on Pororo-SV DatasetOpen Website

Published: 01 Jan 2019, Last Modified: 16 May 2023CSAI 2019Readers: Everyone
Abstract: Generating a sequence of images from a multi-sentence paragraph is a recently proposed task called Story-Visualization. In this task, how to keep the global consistency across dynamic scenes and characters in the story flow is the distinct difference from other single-image works, which is also a significant challenge. However, the visual quality and semantic relevance of existing results are not satisfying when handling datasets with high semantic complexity, such as Pororo-SV cartoon dataset. To address this issue, we propose a new story visualization model named PororoGAN, which jointly considers story-to-image-sequence, sentence-to-image and word-to-image-patch alignment. In particular, we introduce ASE (aligned sentence encoder) and AWE (attentional word encoder) to improve global and local relevance, respectively. Additionally, we add an image patches discriminator to improve the reality of results. Both quantitative and qualitative studies show that PororoGAN outperforms the state-of-the-art models.
0 Replies

Loading