Keywords: Deep Generative Models; Markov Decision Processes; Foundation of Distributional Reinforcement Learning
TL;DR: The first Deep Generative Model for Markov Decision Processes, such as Planning and Distributional Reinforcement Learning
Abstract: Deep Generative Models (DGMs), including Score-based Generative Models, have made significant progress in approximating complex continuous distributions. However, their application to Markov Decision Processes (MDPs), particularly in distributional Reinforcement Learning (RL), is underexplored. The field remains dominated by classical histogram-based methods, which suffer from discretization errors, leading to instability and slower convergence. This work highlights that this gap stems from the nonlinear operators used in modern DGM's modelings, which map neural network functions to the target distribution. These nonlinearities conflict with the linearity required by the Bellman equation, which relates the return distribution of a state to a linear combination of future states' return distributions. To address this, we introduce Bellman Diffusion, a new DGM that preserves the necessary linearity by modeling both the gradient and scalar fields. We propose a novel divergence-based training technique to optimize neural network proxies and introduce a new stochastic differential equation for sampling. With these innovations, Bellman Diffusion is guaranteed to converge to the target distribution. Our experiments show that Bellman Diffusion not only achieves accurate field estimations and serves as an effective image generator, but also converges $1.5\times$ faster than traditional histogram-based baselines in distributional RL tasks. This work paves the way for the effective integration of DGMs into MDP applications, enabling more advanced decision-making frameworks.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6909
Loading