Finite-state Offline Reinforcement Learning with Moment-based Bayesian Epistemic and Aleatoric Uncertainties

Filippo Valdettaro; Aldo A. Faisal

Finite-state Offline Reinforcement Learning with Moment-based Bayesian Epistemic and Aleatoric Uncertainties

Filippo Valdettaro, Aldo A. Faisal

Published: 20 Jun 2023, Last Modified: 11 Oct 2023SODS 2023 PosterEveryoneRevisionsBibTeX

Keywords: Offline Reinforcement Learning, Markov Decision Processes, Bayesian Reinforcement Learning, Uncertainty Quantification, Aleatoric and Epistemic Uncertainty

TL;DR: We carry out uncertainty disentanglement (aleatoric/epistmic quantification) and uncertainty-aware control.by exploiting the discrete-state nature of the MDPs considered.

Abstract: Reinforcement learning (RL) agents can learn complex sequential decision-making and control strategies, often above human expert performance levels. In real-world deployment, it becomes essential from a risk, safety-critical, and human interaction perspective for agents to communicate the degree of confidence or uncertainty they have in the outcomes of their actions and account for it in their decision-making. We assemble here a complete pipeline for modelling uncertainty in the finite, discrete-state setting of offline RL. First, we use methods from Bayesian RL to capture the posterior uncertainty in environment model parameters given the available data. Next, we determine exact values for the return distribution's standard deviation, taken as the measure of uncertainty, for given samples from the environment posterior (without requiring quantile-based or similar approximations of conventional distributional RL) to more efficiently decompose the agent's uncertainty into epistemic and aleatoric uncertainties compared to previous approaches. This allows us to build an RL agent that quantifies both types of uncertainty and utilises its epistemic uncertainty belief to inform its optimal policy through a novel stochastic gradient-based optimisation process. We illustrate the improved uncertainty quantification and Bayesian value optimisation performance of our agent in simple, interpretable gridworlds and confirm its scalability by applying it to a clinical decision support system (AI Clinician) which makes real-time recommendations for sepsis treatment in intensive care units, and address the limitations that arise with inference for larger-scale MDPs by proposing a sparse, conservative dynamics model.

Submission Number: 19

Loading