Keywords: robotics, simulation, manipulation, sim2real, mobile manipulation
TL;DR: MolmoBot demonstrates that large-scale, diverse simulation data alone can produce generalist manipulation policies that transfer zero-shot to the real world for unseen environments, objects, and task instructions.
Abstract: A prevailing view in robot learning is that simulation alone is not enough; effective sim-to-real transfer is widely believed to require at least some real-world data collection or task-specific fine-tuning to bridge the gap between simulated and physical environments. We challenge that assumption.
With sufficiently large-scale and diverse simulated synthetic training data, we show that zero-shot transfer to the real world is not only possible, but effective for both static and mobile manipulation.
We introduce *MolmoBot-Engine*, a fully open-source pipeline for procedural data generation across robots, tasks, and diverse simulated environments in MolmoSpaces. With it, we release *MolmoBot-Data*, a dataset of 1.7 million expert trajectories for articulated object manipulation and pick-and-place tasks.
We train three policy classes: *MolmoBot*, a Molmo2-based multi-frame vision-language model with a flow-matching action head; *MolmoBot-Pi0*, which replicates the
architecture to enable direct comparison; and *MolmoBot-SPOC*, a lightweight policy suitable for edge deployment and amenable to RL fine-tuning.
We evaluate on two robotic platforms: the Franka FR3 for tabletop manipulation tasks and the Rainbow Robotics RB-Y1 mobile manipulator for door opening, drawer manipulation, cabinet interaction, and mobile pick-and-place.
Without any real-world fine-tuning, our policies achieve zero-shot transfer to unseen objects and environments. On tabletop pick-and-place, MolmoBot achieves a success rate of 79.2\% in real world evaluations across 4 settings, outperforming
at 39.2\%. Our results demonstrate that procedural environment generation combined with diverse articulated assets can produce robust manipulation policies that generalize broadly to the real world.
Videos are available [here](https://allenai.github.io/MolmoBot/).
Submission Number: 20
Loading