Abstract: We introduce the Multi-Motion Discrete Diffusion Models
(M2D2M), a novel approach for human motion generation from textual
descriptions of multiple actions, utilizing the strengths of discrete diffusion models. This approach adeptly addresses the challenge of generating multi-motion sequences, ensuring seamless transitions of motions
and coherence across a series of actions. The strength of M2D2M lies in
its dynamic transition probability within the discrete diffusion model,
which adapts transition probabilities based on the proximity between
motion tokens, encouraging mixing between different modes. Complemented by a two-phase sampling strategy that includes independent and
joint denoising steps, M2D2M effectively generates long-term, smooth,
and contextually coherent human motion sequences, utilizing a model
trained for single-motion generation. Extensive experiments demonstrate
that M2D2M surpasses current state-of-the-art benchmarks for motion
generation from text descriptions, showcasing its efficacy in interpreting
language semantics and generating dynamic, realistic motions.
Loading