Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the domain of generative multimedia and interactive experiences, generating realistic and accurate full-body poses from sparse tracking is crucial for many real-world applications, while achieving sequence modeling and efficient motion generation remains challenging. Recently, state space models (SSMs) with efficient hardware-aware designs (i.e., Mamba) have shown great potential for sequence modeling, particularly in temporal contexts. However, processing motion data is still challenging for SSMs. Specifically, the sparsity of input conditions makes motion generation an ill-posed problem. Moreover, the complex structure of the human body further complicates this task. To address these issues, we present Motion Mamba Diffusion (MMD), a novel conditional diffusion model, which effectively utilizes the sequence modeling capability of SSMs and the robust generation ability of diffusion models to track full-body poses accurately. In particular, we design a bidirectional Temporal Mamba Module (TMM) to model motion sequence. Additionally, a Spatial Mamba Module (SMM) is further proposed for feature enhancement within a single frame. Extensive experiments on the large motion capture dataset (AMASS) demonstrate that our proposed approach outperforms the latest methods in terms of accuracy and smoothness and achieves new state-of-the-art performance. Moreover, our approach runs in real-time, making it ideal for employment in practical applications. The source code will be made public upon acceptance of this paper.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Generation] Generative Multimedia, [Experience] Multimedia Applications, [Experience] Art and Culture
Relevance To Conference: We propose Motion Mamba Diffusion, a novel conditional generation framework that combines the robust generation capabilities of diffusion models with the sequence modeling proficiency of state space models. This innovative approach enables the effective and precise capture and generation of realistic human movements from sparse tracking, thereby greatly enhancing the user experience in interactive multimedia applications. Furthermore, the high-quality human body movements produced by Motion Mamba Diffusion have the potential to enrich artistic and cultural expressions through digital artworks and interactive multimedia installations. In summary, Motion Mamba Diffusion represents a significant advancement in human-computer interaction, multimedia applications, and the intersection of art and culture, aligning closely with various themes of ACM multimedia.
Submission Number: 2195
Loading