World Action Models are Zero-shot Policies

Seonghyeon Ye; Yunhao Ge; Kaiyuan Zheng; Shenyuan Gao; Sihyun Yu; George Kurian; Suneel Indupuru; You Liang Tan; Chuning Zhu; Jiannan Xiang; Ayaan Naveed Malik; Kyungmin Lee; William Liang; Nadun Ranawaka Arachchige; Jiasheng Gu; Yinzhen Xu; Guanzhi Wang; Fengyuan Hu; Avnish Narayan; Johan Bjorck; Jing Wang; Gwanghyun Kim; Dantong Niu; Ruijie Zheng; Yuqi Xie; Jimmy Wu; Qi Wang; Danfei Xu; Yilun Du; Ryan Julian; Yevgen Chebotar; Scott Reed; Jan Kautz; Yuke Zhu; Linxi Fan; Joel Jang

World Action Models are Zero-shot Policies

Published: 02 Mar 2026, Last Modified: 15 Apr 2026ICLR 2026 Workshop World ModelsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robot Learning: Imitation Learning, Robot Learning: Model Learning, Robot Learning: World Model

TL;DR: DreamZero is a World Action Model for robot policy that shows state-of-the-art task generalization, with real-time control, and strong cross-robot transfer.

Abstract: State-of-the-art Vision-Language-Action (VLA) models excel at semantic generalization but struggle to generalize to unseen physical motions in novel environments. We introduce DREAMZERO, a World Action Model (WAM) built on a pretrained video diffusion backbone. Unlike VLAs, WAMs learn physical dynamics by predicting future world states and actions, using video as a dense representation of how the world evolves. By jointly modeling video and action, DREAMZERO effectively learns diverse skills from heterogeneous robot data without relying on repetitive demonstrations, resulting in over 2× improvement in generalization to new tasks and environments compared to state-of-the-art VLAs in real-robot experiments. Crucially, through model and system optimizations, we enable a 14B autoregressive video diffusion model to perform real-time closed-loop control at 7 Hz. Finally, we demonstrate cross-embodiment transfer in both directions: (1) video-only demonstrations from other robots or humans improve unseen task performance by over 40% with just 10–20 minutes of data, and (2) DREAMZERO adapts to entirely new embodiments, achieving zero-shot generalization on the YAM robot with only 30 minutes of play data.

Supplementary Material: zip

Submission Number: 110

Loading