MoHI: Boosting Motion Generation via Human Intention Understanding

ACM SGA 2025 Workshop TriFusion Submission4 Authors

12 Sept 2025 (modified: 16 Sept 2025)ACM SGA 2025 Workshop TriFusion SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Motion Generation, Human Intention Understanding, Motion Caption
Abstract: We propose MoHI, a motion generation framework that explicitly models human intention as the underlying cause of motion. By explicitly disentangling intention prediction from motion synthesis during training and jointly optimizing the two objectives, MoHI captures the motivational logic underlying human actions and provides clearer semantic guidance for coherent motion generation. Experiments on HumanML3D demonstrate state-of-the-art performance, with +4.5% improvement in R-Precision Top-1 and 38.6% lower FID over the state-of-the-art method. Fine-tuned on motion captioning, MoHI also outperforms recent LLM-based approaches, highlighting its unified strength in both motion understanding and generation.
Submission Number: 4
Loading