MolmoAct: Action Reasoning Models that can Reason in Space

Published: 06 Sept 2025, Last Modified: 26 Sept 2025CoRL 2025 Robot Data WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Action Reasoning Model, Reasoning in Space, Vision-Language-Action Model, Robots, Learning, Manipulation
Abstract: Reasoning is central to purposeful action, yet most robotic foundation models map perception and instructions directly to control, which limits adaptability, generalization, and semantic grounding. We introduce Action Reasoning Models (ARMs), a class of robotic foundation models that integrates perception, planning, and control through a structured three-stage pipeline. Our model, MolmoAct, encodes observations and instructions into depth-aware perception tokens, generates mid-level spatial plans as editable trajectory traces, and predicts precise low-level actions, enabling explainable and steerable behavior. MolmoAct achieves strong performance across simulation and real-world settings: 70.5% zero-shot accuracy on SimplerEnv Visual Matching tasks, surpassing closed-source π0 and GR00T N1; 86.6% average success on LIBERO, including a +6.3% gain over ThinkAct on long-horizon tasks; and in real-world fine-tuning, +10% (single-arm) and +22.7% (bimanual) task progression over π0-Fast. It also outperforms baselines by +23.3% on out-of-distribution generalization and achieves top human-preference scores for open-ended instruction following and trajectory steering. Furthermore, we release, for the first time, the MolmoAct Dataset—a mid-training robot dataset comprising over 10,000 high-quality robot trajectories across diverse scenarios and tasks. Training with this dataset yields an average 5.5% improvement in general performance over the base model. We release all model weights, training code, MolmoAct Dataset and our action reasoning dataset, establishing MolmoAct as both a state-of-the-art robotics foundation model and an open blueprint for building ARMs that transform perception into purposeful action through grounded reasoning.
Lightning Talk Video: mp4
Submission Number: 29
Loading