Risk-Averse Bayes-Adaptive Reinforcement Learning

Marc Rigter; Bruno Lacerda; Nick Hawes

Risk-Averse Bayes-Adaptive Reinforcement Learning

Marc Rigter, Bruno Lacerda, Nick Hawes

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: reinforcement learning, planning, model-based bayesian reinforcement learning, risk

TL;DR: Addresses risk sensitive optimisation in the model-based Bayesian reinforcement learning context.

Abstract: In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: zip

13 Replies

Loading