Keywords: Reinforcement learning, Bayesian sampling, Stochastic gradient MCMC, Value function approximation
TL;DR: We propose an efficient and scalable sampling framework for reinforcement learning, which enables uncertainty quantification and addresses the local trap issue suffered by the existing training approaches.
Abstract: Value-based algorithms have achieved great successes in solving Reinforcement Learning problems via minimizing the mean squared Bellman error (MSBE). Temporal-difference (TD) algorithms such as Q-learning and SARSA often use stochastic gradient descent based optimization approaches to estimate the value function parameters, but fail to quantify their uncertainties. In our work, under the Kalman filtering paradigm, we establish a novel and scalable sampling framework based on stochastic gradient Markov chain Monte Carlo, which allows us to efficiently generate samples from the posterior distribution of deep neural network parameters. For TD-learning with both linear and nonlinear function approximation, we prove that the proposed algorithm converges to a stationary distribution, which allows us to measure uncertainties of the value function and its parameters.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)