Keywords: Human Motion Synthesis, Keyframe-Guided Generation, Trajectory Conditioning, Robustness, Diffusion Models
TL;DR: We present MoFA, a diffusion-based motion factorization framework that jointly leverages keyframes and trajectories through LMRS, TAMI, and NPAT to generate realistic, consistent, and controllable human motions.
Abstract: Human motion synthesis has recently benefited from diffusion models, achieving unprecedented realism and diversity. Yet precise and controllable generation remains challenging: text, audio, and 2D cues are often ambiguous, while existing trajectory-keyframe approaches suffer from limited generalization, naive feature fusion, and poor robustness to unpaired control signals. We identify this bottleneck as the entanglement between keyframe and trajectory signals, which are inherently coupled in training but frequently mismatched at inference. To address this, we propose MoFA, a diffusion-based Motion Factorization framework that decomposes synthesis into two complementary sub-tasks: (i) Local Motion Completion, focusing on keyframe dynamics, and (ii) Trajectory Adaptation, ensuring global spatial consistency. MoFA integrates the Local Motion Refinement Stack (LMRS) and the Trajectory-Aware Motion Integration (TAMI) to jointly refine local poses and adapt them to trajectories. In addition, we introduce a Quality-Aware Dual Training (QADT) strategy that leverages imperfect or low-quality data as auxiliary supervision, substantially expanding the effective training set and improving generalization. Extensive experiments demonstrate that MoFA achieves more stable, controllable, and robust motion synthesis than advanced baselines.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 11051
Loading