Value-Distributional Model-Based Reinforcement LearningDownload PDF

Published: 20 Jul 2023, Last Modified: 29 Aug 2023EWRL16Readers: Everyone
Keywords: Model-Based Reinforcement Learning, Uncertainty, Distributional Reinforcement Learning
Abstract: Quantifying uncertainty about a policy's long-term performance is key in sequential decision-making tasks. We study the problem from a Bayesian perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape (e.g., Gaussians). Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms.
1 Reply

Loading