Risk-Averse Bayes-Adaptive Reinforcement LearningDownload PDF

21 May 2021, 20:42 (edited 21 Jan 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: reinforcement learning, planning, model-based bayesian reinforcement learning, risk
  • TL;DR: Addresses risk sensitive optimisation in the model-based Bayesian reinforcement learning context.
  • Abstract: In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: zip
13 Replies

Loading