Risk-aware Bayesian Reinforcement Learning for Cautious Exploration

Rohan Narayan Langford Mitta; Hosein Hasanbeig; Daniel Kroening; Alessandro Abate

Risk-aware Bayesian Reinforcement Learning for Cautious Exploration

Rohan Narayan Langford Mitta, Hosein Hasanbeig, Daniel Kroening, Alessandro Abate

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone

Abstract: This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. Whilst enforcing safety during training might limit the agent's exploration, we propose a new architecture that handles the trade-off between efficient progress in exploration and safety maintenance. As the agent's exploration progresses, we update Dirichlet-Categorical models of the transition probabilities of the Markov decision process that describes the agent's behavior within the environment by means of Bayesian inference. We then propose a way to approximate moments of the agent's belief about the risk associated with the agent's behavior originating from local action selection. We demonstrate that this approach can be easily coupled with RL, we provide rigorous theoretical guarantees, and we present experimental results to showcase the performance of the overall architecture.

1 Reply

Loading