Latent Diffusion Planning for Imitation Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We learn a diffusion-based planner and inverse dynamics model in latent space for imitation learning.
Abstract: Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. First, we learn a compact latent space through a variational autoencoder, enabling effective forecasting of future states in image-based domains. Then, we train a planner and an inverse dynamics model with diffusion objectives. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches, as they cannot leverage such additional data.
Lay Summary: Imitation learning has become a recipe for success in learning robot models that can accomplish complex vision-based tasks. However, imitation methods require large datasets, particularly high-quality ones collected by experts. In order to address this limitation, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a high-level planner, that can leverage action-free demonstrations, and an inverse dynamics model, that can leverage suboptimal data, that both operate over a learned latent space. First, we learn a latent space through a model that compresses high-dimensional images into lower dimensional vectors, enabling effective computation. Then, we train the high-level and low-level model over this latent space using diffusion objectives. The planner predicts desired future states, and the low-level inverse dynamics models takes pairs of states and predicts actions. This allows LDP to leverage suboptimal and action-free data, and outperform existing methods that cannot use such data.
Primary Area: Applications->Robotics
Keywords: imitation learning, diffusion, planning, robotics
Submission Number: 7928
Loading