Keywords: generative models, human motion generation
Abstract: In this paper, we address the problem of 3D human motion generation, which aims at learning a model to generate plausible and diverse future sequences of 3D human poses from an observed one. Current state-of-the-art solutions propose injecting a single random latent vector into a deterministic motion prediction framework. The stochasticity in the generative process is thus modeled at the whole sequence level, which is inconsistent with the inherent time-dependent uncertainty of human motion (e.g. people can jump or walk after getting up from a chair). To overcome this limitation we propose Hierarchical Transformer Dynamical Variational Autoencoder (HiT-DVAE), a deep generative model with sequential latent variables that can efficiently learn the stochastic dynamics of human motion. The proposed model learns an expressive time-varying latent space that encodes diverse and realistic human motions. A thorough evaluation on HumanEva-I and Human3.6M datasets using various metrics shows that HiT-DVAE performs better than current state-of-the-art methods. Our code will be released upon publication.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Generative models
Supplementary Material: zip
5 Replies
Loading