Anatomically-Informed Vector Quantization Variational Auto-Encoder for Text to Motion Generation

Lian Chen; Zehai Niu; Qingyuan Liu; Jinbao Wang; Jian Xue; Ke Lu

Anatomically-Informed Vector Quantization Variational Auto-Encoder for Text to Motion Generation

Lian Chen, Zehai Niu, Qingyuan Liu, Jinbao Wang, Jian Xue, Ke Lu

Published: 01 Jan 2024, Last Modified: 06 Mar 2025ICME Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the era of big models for art creation, 3D human motion generation has become a crucial research direction, with Vector Quantized Variational Auto-Encoders (VQ-VAEs) playing a pivotal role in bridging modalities for cross-modal tasks. This paper introduces an Anatomically-Informed VQ-VAE, designed to leverage the inherent structure of the human body, a key yet previously underutilized bridge in this domain. The proposed method enhances performance by partitioning motion data into anatomically meaningful subgroups, allowing for the learning of expressive and semantically meaning-ful latent representations. The significance of this approach is twofold: it not only demonstrates state-of-the-art performance on the KIT dataset, but also underscores the necessity of integrating isomorphic components, those with shared structures across different modalities, into the design of cross-modal tasks. The emphasis on isomorphism paves the way for a deeper understanding of how to effectively map between modalities in AI-driven art generation, opening new avenues for future research.

Loading