WARPD – World-model Assisted Reactive Policy Diffusion

WARPD – World-model Assisted Reactive Policy Diffusion

ICLR 2026 Conference Submission22422 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: World Models, Imitation Learning, Latent Diffusion, Robotics, Policy Learning, Parameter Generation

TL;DR: As an alternative to diffusion policies we generate closed-loop policies instead of trajectories by using a hypernetwork VAE, a world model, & latent diffusion, enabling fewer diffusion queries, perturbation robustness, & smaller inference policies.

Abstract: With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural policies) directly, instead of open-loop trajectories. By learning behavioral distributions in parameter space rather than trajectory space, WARPD offers two major advantages: (1) extended action horizons with robustness to perturbations, while maintaining high task performance, and (2) significantly reduced inference costs. Empirically, WARPD outperforms DP in long-horizon and perturbed environments, and achieves multitask performance on par with DP while requiring only ∼ 1/45th of the inference-time FLOPs per step.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 22422

Loading