Keywords: Maximum Entropy Reinforcement Learning, diffusion based reinforcement learning, diffusion reinforcement learning, reinforcement learning
TL;DR: We propose a tractable lower bound for optimizing diffusion-based policies in the maximum entropy reinforcement learning setting.
Abstract: Maximum entropy reinforcement learning (MaxEnt-RL)
has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy.
To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective.
Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Onur_Celik1
Track: Fast Track: published work
Publication Link: celik@kit.edu
Submission Number: 138
Loading