Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Keywords: Distributional Reinforcement Learning, Regret Analysis, General Value Function Approximation
Abstract: Distributional reinforcement learning improves performance by effectively capturing environmental stochasticity.
However, existing research on its regret analysis has relied heavily on structural assumptions that are difficult to implement in practice.
In particular, there has been little attention to the infeasibility issue of dealing with the infinite-dimensionality of a distribution.
To overcome this infeasibility, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting through *statistical functional dynamic programming*.
We first introduce a key notion of *Bellman unbiasedness* which is essential for exactly learnable and provably efficient updates.
Our theoretical results demonstrate that the only way to exactly capture statistical information, including nonlinear statistical functionals, is by representing the infinite-dimensional return distribution with a finite number of moment functionals.
Secondly, we propose a provably efficient algorithm, *SF-LSVI*, that achieves a tight regret bound of $\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number of episodes, and $d_E$ is the eluder dimension of a function class.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6300
Loading