TFGAN: Improving Conditioning for Text-to-Video Synthesis

Yogesh Balaji; Martin Renqiang Min; Bing Bai; Rama Chellappa; Hans Peter Graf

TFGAN: Improving Conditioning for Text-to-Video Synthesis

Yogesh Balaji, Martin Renqiang Min, Bing Bai, Rama Chellappa, Hans Peter Graf

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Developing conditional generative models for text-to-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a GAN model with novel conditioning scheme that aids improving the text-video associations. With a combination of this conditioning scheme and a deep GAN architecture, TFGAN generates photo-realistic videos from text on very challenging real-world video datasets. In addition, we construct a benchmark synthetic dataset of moving shapes to systematically evaluate our conditioning scheme. Extensive experiments demonstrate that TFGAN significantly outperforms the existing approaches, and can also generate videos of novel categories not seen during training.

Keywords: Conditional GAN, Video Generation, Text-to-Video Synthesis, Conditional Generative Models, Deep Generative Models

TL;DR: An effective text-conditioning GAN framework for generating videos from text

4 Replies

Loading