Finite-state Offline Reinforcement Learning with Moment-based Bayesian Epistemic and Aleatoric Uncertainties

Filippo Valdettaro; Aldo A. Faisal

Finite-state Offline Reinforcement Learning with Moment-based Bayesian Epistemic and Aleatoric Uncertainties

Filippo Valdettaro, Aldo A. Faisal

Published: 20 Jul 2023, Last Modified: 01 Sept 2023EWRL16Readers: Everyone

Keywords: Offline Reinforcement Learning, Markov Decision Processes, Bayesian Reinforcement Learning, Uncertainty Quantification, Aleatoric and Epistemic Uncertainty

TL;DR: We carry out uncertainty disentanglement (aleatoric/epistmic quantification) and uncertainty-aware control by exploiting the discrete-state nature of the MDPs considered.

Abstract: We assemble here a complete pipeline for modelling uncertainty in the finite, discrete-state setting of offline reinforcement learning (RL). First, we use methods from Bayesian RL to capture the posterior uncertainty in environment model parameters given the available data. Next, we determine exact values for the return distribution's standard deviation, taken as the measure of uncertainty, for given samples from the environment posterior to decompose the agent's uncertainty into epistemic and aleatoric uncertainties. This allows us to build an RL agent that quantifies both types of uncertainty and utilises its uncertain belief to inform its optimal policy through a novel stochastic gradient-based optimisation process. We illustrate the uncertainty quantification and Bayesian value optimisation performance of our agent in simple, interpretable gridworlds and confirm its scalability by applying it to a clinical decision support system (AI Clinician) which makes real-time recommendations for sepsis treatment in intensive care units, and address the limitations that arise with inference for larger-scale MDPs by proposing a sparse, conservative dynamics model.

1 Reply

Loading