Keywords: World Models, Imitation Learning, Reinforcement Learning, Diffusion Policies
TL;DR: Fine-tuning diffusion policies by exploring entirely inside frozen world models learned from unstructured play data.
Abstract: Fine-tuning diffusion policies with reinforcement learning (RL) is challenging due to the long denoising sequence, which impedes reward propagation, and the high sample requirements of standard RL. While prior work frames the denoising process as a Markov Decision Process to enable policy updates, it still relies heavily on costly environment interactions. We propose DiWA, a novel framework that fine-tunes diffusion-based robotic skills entirely offline using a world model and RL. Unlike model-free methods that require extensive online interaction, DiWA leverages a world model trained on just a few hours of teleoperated play, enabling efficient and safe adaptation. On the CALVIN benchmark, DiWA improves performance across eight tasks using only offline adaptation, while baselines rely on hundreds of thousands of real-world interaction steps. To our knowledge, this is the first method to fine-tune diffusion policies for real-world robotic skills using an offline world model.
Submission Number: 12
Loading