Diffusion-Based Maximum Entropy Reinforcement Learning

Onur Celik; Zechu Li; Denis Blessing; Ge Li; Daniel Palenicek; Jan Peters; Georgia Chalvatzaki; Gerhard Neumann

Diffusion-Based Maximum Entropy Reinforcement Learning

Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

Published: 28 Feb 2025, Last Modified: 10 Apr 2025WRL@ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: full paper

Keywords: Maximum Entropy Reinforcement Learning, Diffusion

TL;DR: We introduce a novel method to effectively train diffusion-based policies in online reinforcement learning based on the maximum entropy RL objective.

Abstract: Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Denis_Blessing1

Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 21

Loading