Semantic Video Synthesis from Video Scene Graphs

Yuren Cong; Jinhui Yi; Bodo Rosenhahn; Michael Ying Yang

Semantic Video Synthesis from Video Scene Graphs

Yuren Cong, Jinhui Yi, Bodo Rosenhahn, Michael Ying Yang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: video synthesis, scene graph, scene understanding

TL;DR: A video scene graph-to-video synthesis framework is proposed with a pre-trained video scene graph encoder, VQ-VAE and auto-regressive Transformer.

Abstract: Video synthesis has recently attracted a lot of attention, as the natural extension to the image synthesis task. Most image synthesis works use class labels or text as guidance. However, neither labels nor text can provide explicit temporal guidance, such as when action starts or ends. To overcome this limitation, we introduce video scene graphs as input for video synthesis, as they represent the spatial and temporal relationships between objects in the scene. Since video scene graphs are usually temporally discrete annotations, we propose a video scene graph (VSG) encoder that not only encodes the existing video scene graphs but also predicts the graph representations for unlabeled frames. The VSG encoder is pre-trained with different contrastive multi-modal losses. A video scene graph-to-video synthesis framework (SGVS) based on the pre-trained VSG encoder, VQ-VAE, and auto-regressive Transformer is proposed to synthesize a semantic video given an initial scene image and a non-fixed number of video scene graphs. We evaluate SGVS and other state-of-the-art video synthesis models on Action Genome dataset and demonstrate the positive significance of video scene graphs in video synthesis.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Generative models

Supplementary Material: zip

5 Replies

Loading