Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Published: 06 Mar 2025, Last Modified: 24 Apr 2025FPI-ICLR2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Generative Models, Distributional Reinforcement Learning, Markov Decision Processes
TL;DR: A deep generative model for Markov decision processes, such as Planning and Distributional Reinforcement Learning.
Abstract: Deep Generative Models (DGMs), such as Diffusion Models, have achieved promising performance in approximating complex data distributions. However, it is rare to see their application to distributional Reinforcement Learning (RL), which remains dominated by the classical histogram-based methods that inevitably incur discretization errors. In this paper, we highlight that this gap stems from the non-linearity of modern DGMs, which conflicts with the linear structure of the Bellman equation, a key technique to permit efficiently training RL models. To address this, we introduce \emph{Bellman Diffusion}, a new DGM that preserves the necessary linearity by modeling both the gradient and scalar fields. We propose a novel divergence-based training technique to optimize neural network proxies and introduce a new stochastic differential equation for sampling. With these innovations, Bellman Diffusion is guaranteed to converge to the target distribution. Our experiments show that Bellman Diffusion not only achieves accurate field estimations and serves as an effective image generator, but also converges $1.5\times$ faster than traditional histogram-based baselines in distributional RL tasks. This work paves the way for the effective integration of DGMs into MDP applications, enabling more advanced decision-making frameworks.
Submission Number: 20
Loading