Keywords: Video World Models, Synthetic Data, Behavior Generalization, Environment Generalization
TL;DR: We propose neural trajectories, a new pipeline that enables augmenting robot training data, enabling robots to perform totally new actions in unseen environments.
Abstract: In this work, we unlock new capabilities in robot learning from neural trajectories, synthetic robot data generated from video world models. Our proposed recipe is simple, but powerful: we take the most recent state-of-the-art video generative models (world models), adapt them to the target robot embodiment, and generate new, synthetic robot data of the same task or even new behaviors. Since these video world models only generate videos, we explore two techniques of getting robot actions: extracting latent actions from a general-purpose latent action model and getting predicted actions from an inverse-dynamics model (IDM), giving flexibility across diverse scenarios. Our proposed approach unlocks behavior and environment generalization, allowing a humanoid robot to perform 20+ new behaviors in unseen environments while only collecting teleoperation data for pick and place in a single environment. By introducing a new world modeling benchmark, we demonstrate that stronger video world models directly correlate with improved downstream robot policy performance. This establishes a new scaling dimension beyond simply collecting additional teleoperation data, changing how we approach robot learning.
Supplementary Material: zip
Submission Number: 836
Loading