OpenT2M: No-frill Motion Generation with Open-source, Large-scale, High-quality Data

16 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: human motion generation, large multimodal model, human motion understanding
Abstract: Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-source motion dataset containing over 2800 hours of human motion. Each sequence undergoes rigorous quality control through physical feasibility validation and multi-granularity filtering, with detailed second-wise text annotations. We also develop an automated pipeline for creating long-horizon sequences, enabling complex motion generation. Building upon OpenT2M, we introduce no-frill, a pretrained T2M model that achieves excellent performance without complicated designs and technique tricks. Its core component is 2D-PRQ, a novel motion tokenizer that captures spatial and temporal dependencies by dividing the human body into five parts. Comprehensive experiments show that OpenT2M significantly improves generalization of existing T2M models, while 2D-PRQ achieves superior reconstruction and strong zero-shot performance. We expect OpenT2M and no-frill will advance the T2M field by addressing longstanding data quality and benchmarking challenges. Our data and code are released on https://anonymous.4open.science/r/OpenT2M.
Primary Area: datasets and benchmarks
Submission Number: 7068
Loading