Few-shot Video-to-Video SynthesisDownload PDF

Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Bryan Catanzaro, Jan Kautz

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: Video-to-video synthesis (vid2vid) aims at converting an input semantic video such as human poses or segmentation masks to an output photorealistic video. While it has recently received increasing attention for its wide range of applications, existing vid2vid approaches share two major limitations. First, they are data hungry. Numerous images of a target human subject or a scene are required for training. Second, a learned model has limited generalization capability. For example, while a human vid2vid model can synthesize unseen poses of the same human in the training set, it does not generalize to humans that are not included in the training set. To address the limitations, we propose an adaptive vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time. Our model achieves this few-shot generalization capability via a novel network weight generation module utilizing an attention mechanism. We conduct extensive experimental validations with comparisons to strong baselines on different datasets. The experimental results verify the effectiveness of the proposed framework in addressing the two limitations of existing vid2vid approaches. Our code will be released upon publication. Note: Adobe Reader is highly recommended to view this paper.
CMT Num: 2765
Code Link: https://github.com/NVlabs/few-shot-vid2vid
0 Replies

Loading