Intend to Move: A Multimodal Dataset for Intention-Aware Human Motion Understanding

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: human motion dataset, human motion prediction, human intention, multimodal
TL;DR: Intend to Move (I2M) is a new multimodal dataset for embodied AI, designed for intention-aware human motion understanding in real-world environments.
Abstract: Human motion is inherently intentional, yet most motion modeling paradigms focus on low-level kinematics, overlooking the semantic and causal factors that drive behavior. Existing datasets further limit progress: they capture short, decontextualized actions in static scenes, providing little grounding for embodied reasoning. To address these limitations, we introduce $\textit{Intend to Move (I2M)}$, a large-scale, multimodal dataset for intention-grounded motion modeling. I2M contains 10.1 hours of two-person 3D motion sequences recorded in dynamic realistic home environments, accompanied by multi-view RGB-D video, 3D scene geometry, and language annotations of each participant’s evolving intentions. Benchmark experiments reveal a fundamental gap in current motion models: they fail to translate high-level goals into physically and socially coherent motion. I2M thus serves not only as a dataset but as a benchmark for embodied intelligence, enabling research on models that can reason about, predict, and act upon the ``why'' behind human motion.
Croissant File: json
Dataset URL: https://ummaaa.github.io/projects/intend-to-move/
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 787
Loading