Scaling Large Motion Models with Million-Level Human Motions

Ye Wang; Sipeng Zheng; Bin Cao; Qianshan Wei; Weishuai Zeng; Qin Jin; Zongqing Lu

Scaling Large Motion Models with Million-Level Human Motions

Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Weishuai Zeng, Qin Jin, Zongqing Lu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce a million-scale motion dataset to ease data scarcity. We also explore scaling laws in motion generation.

Abstract: Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted toward developing large motion models. Despite some progress, current efforts remain far from achieving truly generalist models, primarily due to the lack of massive high-quality data. To address this gap, we present MotionLib, the first million-level dataset for motion generation, which is at least 15$\times$ larger than existing counterparts and enriched with hierarchical text descriptions. Using MotionLib, we train a large motion model named Being-M0, demonstrating robust performance across a wide range of human activities, including unseen ones. Through systematic investigation, for the first time, we highlight the importance of scaling both data and model size for advancing motion generation, along with key insights to achieve this goal. To better integrate the motion modality, we propose Motionbook, an innovative motion encoding approach including (1) a compact yet lossless feature to represent motions; (2) a novel 2D lookup-free motion tokenizer that preserves fine-grained motion details while expanding codebook capacity, significantly enhancing the representational power of motion tokens. We believe this work lays the groundwork for developing more versatile and powerful motion generation models in the future. For further details, visit https://beingbeyond.github.io/Being-M0/.

Lay Summary: Inspired by advanced large language models, we aim to create large models that can truly understand and generate diverse human motions. However, progress is hindered by a lack of massive, high-quality motion data. To address this, we introduce MotionLib—a dataset with over a million human motions, much larger than existing ones, where each motion has detailed descriptions. We also develop MotionBook, a new method that compactly and accurately represents motions, preserving fine-grained details, to efficiently help models learn these motions. Using these, we train Being-M0, which shows strong performance across many activities, including those it has not seen before. Our work highlights that larger datasets and models are key to improving motion generation, paving the way for developing more versatile large models in fields like gaming and robotics.

Link To Code: https://beingbeyond.github.io/Being-M0/

Primary Area: Applications->Computer Vision

Keywords: motion generation

Submission Number: 6549

Loading