Diffusion Policy Policy Optimization

Published: 29 Oct 2024, Last Modified: 03 Nov 2024CoRL 2024 Workshop MRM-D OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, diffusion policy
TL;DR: We introduce DPPO, an algorithmic framework and set of best practices for fine-tuning diffusion-based policies in continuous control and robot learning tasks.
Abstract: We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware.
Submission Number: 15
Loading