Abstract: We present Sobolev diffusion policy (SDP), a novel framework to combine the strengths of policy learning and trajectory optimization effectively. On the one hand, we build upon diffusion policy, an expressive imitation learning method based on diffusion probabilistic generative models. On the other hand, we use gradient-based trajectory optimization solvers to generate locally optimal trajectories and leverage their associated feedback gains to enrich Sobolev training with first-order information. Combining both, we introduce a first-order loss for diffusion-based policies. The framework alternates between collecting trajectories using a solver warmstarted by the policy and training. Through comprehensive experiments, we demonstrate how the Sobolev component significantly reduces the number of trajectories required for the policy to converge globally. First-order information both avoids overfitting, despite the use of very few samples, and mitigates the compounding error issue of imitation-based policies, even when predicting torques for tasks requiring high-frequency control. We benchmark the benefits of SDP on various robotics tasks of increasing complexity. In particular, SDP shows to be stable over extended horizons, with fewer diffusion steps, shrinking the overall rollout time compared to vanilla diffusion models. And when used to compute initial guesses for trajectory optimization, it reduces the solving time by a factor of 2 to 20.
Loading