- Keywords: video generation, GANs, scalable methods
- Abstract: Current state-of-the-art generative models for videos have high computational requirements that impede high resolution generations beyond a few frames. In this work we propose a stage-wise strategy to train Generative Adversarial Networks (GANs) for videos. We decompose the generative process to first produce a downsampled video that is then spatially upscaled and temporally interpolated by subsequent stages. Upsampling stages are applied locally on temporal chunks of previous outputs to manage the computational complexity. Stages are defined as Generative Adversarial Networks, which are trained sequentially and independently. We validate our approach on Kinetics-600 and BDD100K, for which we train a three stage model capable of generating 128x128 videos with 100 frames.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
- One-sentence Summary: We propose a scalable methods to generate videos by training a GAN to produce a low resolution and temporally subsampled version of a video, which is then upsampled by one or more local upsampling stages.
- Supplementary Material: zip
- Reviewed Version (pdf): https://openreview.net/references/pdf?id=v4KsmLKzCj