Steering Diffusion Policies with Value-Guided Denoising

Published: 16 Sept 2025, Last Modified: 16 Sept 2025CoRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, imitation learning, diffusion models, diffusion policy, behavior cloning, robotic manipulation, online learning, sample efficiency, deployment, sim-to-real transfer.
TL;DR: We steer robotics diffusion policies with a Q-value function during DDIM denoising steps, outperforming previous fine-tuning methods on Robomimic tasks.
Abstract: Diffusion-based robotic policies trained with imitation learning has achieved remarkable results in complex manipulation tasks. However, such policies are constrained by the quality and coverage of their training data, limiting their adaptation to new environments. Existing approaches to address this obstacle typically rely on fine-tuning the diffusion model, which can be unstable and require costly human demonstrations. We instead study the online adaptation of pretrained diffusion policies without parameter updates. We introduce $\textit{Value-Guided Denoising}$ (VGD), a simple method that steers a frozen diffusion policy using gradients from a reinforcement-learned value function. At inference, VGD guides diffusion denoising steps toward actions with higher Q-values. This enables adaptation with only black-box access to the pretrained policy. On Robomimic benchmarks, our method achieves substantially higher success rates than existing RL-with-diffusion approaches. These results demonstrate that diffusion policies can be steered efficiently at deployment, yielding strong performance gains with minimal data and computation. Code available at https://anonymous.4open.science/r/VGD.
Submission Number: 18
Loading