Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Yangming Li; Chieh-Hsin Lai; Carola-Bibiane Schönlieb; Yuki Mitsufuji; Stefano Ermon

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

Yangming Li, Chieh-Hsin Lai, Carola-Bibiane Schönlieb, Yuki Mitsufuji, Stefano Ermon

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Generative Models; Markov Decision Processes; Foundation of Distributional Reinforcement Learning

TL;DR: The first Deep Generative Model for Markov Decision Processes, such as Planning and Distributional Reinforcement Learning

Abstract:

Deep Generative Models (DGMs), including Score-based Generative Models, have made significant progress in approximating complex continuous distributions. However, their application to Markov Decision Processes (MDPs), particularly in distributional Reinforcement Learning (RL), is underexplored. The field remains dominated by classical histogram-based methods, which suffer from discretization errors, leading to instability and slower convergence. This work highlights that this gap stems from the nonlinear operators used in modern DGM's modelings, which map neural network functions to the target distribution. These nonlinearities conflict with the linearity required by the Bellman equation, which relates the return distribution of a state to a linear combination of future states' return distributions. To address this, we introduce Bellman Diffusion, a new DGM that preserves the necessary linearity by modeling both the gradient and scalar fields. We propose a novel divergence-based training technique to optimize neural network proxies and introduce a new stochastic differential equation for sampling. With these innovations, Bellman Diffusion is guaranteed to converge to the target distribution. Our experiments show that Bellman Diffusion not only achieves accurate field estimations and serves as an effective image generator, but also converges $1.5\times$ faster than traditional histogram-based baselines in distributional RL tasks. This work paves the way for the effective integration of DGMs into MDP applications, enabling more advanced decision-making frameworks.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6909

Loading