3D-Aware Video Generation

Sherwin Bahmani; Jeong Joon Park; Despoina Paschalidou; Hao Tang; Gordon Wetzstein; Leonidas Guibas; Luc Van Gool; Radu Timofte

3D-Aware Video Generation

Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte

Published: 07 Jun 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Acknowledged limitation of 2D upsampling layers decreasing 3D consistency - Added geometry evaluation results - Visualized TaiChi depth maps - Clarified hyperparameter selection - Clarified inductive bias for motion/content disentanglement - Clarified missing ACD and CPBD metrics - Updated Fig. 2 - Added missing references - Fixed equations - Fixed citations

Code: https://github.com/sherwinbahmani/3dvideogeneration/

Supplementary Material: zip

Assigned Action Editor: ~Mathieu_Salzmann1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 791

Loading