Keywords: Motion generation, Gaussian Mixture, VAE, Autoregressive Generation
TL;DR: A continuously autoregressive approach for text to motion generation with gaussian mixture-guided latent sampling.
Abstract: Existing efforts in motion synthesis typically utilize either generative transformers with discrete representations or diffusion models with continuous representations.
However, the discretization process in generative transformers can introduce motion errors, while the sampling process in diffusion models tends to be slow.
In this paper, we propose a novel text-to-motion synthesis method GMMotion that combines a continuous motion representation with an autoregressive model, using the Gaussian mixture model (GMM) to represent the conditional probability distribution.
Unlike autoregressive approaches relying on residual vector quantization, our model employs continuous motion representations derived from the VAE's latent space. This choice streamlines both the training and the inference processes.
Specifically, we utilize a causal transformer to learn the distributions of continuous motion representations, which are modeled with a learnable Gaussian mixture model.
Extensive experiments demonstrate that our model surpasses existing state-of-the-art models in the motion synthesis task.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 4221
Loading