Keywords: Safe reinforcement learning, safety, Markov games, stochastic games
Abstract: Exploring in an unknown system can place an agent in dangerous situations,
exposing to potentially catastrophic hazards. Many current approaches for tackling
safe learning in reinforcement learning (RL) lead to a trade-off between safe
exploration and fulfilling the task. Though these methods possibly incur fewer
safety violations they often also lead to reduced task performance. In this paper, we
take the first step in introducing a generation of RL solvers that learn to minimise
safety violations while maximising the task reward to the extend that can be
tolerated by safe policies. Our approach uses a new two-player framework for safe
RL called DESTA. The core of DESTA is a novel game between two RL agents:
Safety Agent that is delegated the task of minimising safety violations and Task
Agent whose goal is to maximise the reward set by the environment task. Safety
Agent can selectively take control of the system at any given point to prevent
safety violations while Task Agent is free to execute its actions at all other states.
This framework enables Safety Agent to learn to take actions that minimise future
safety violations (during and after training) by performing safe actions at certain
states while Task Agent performs actions that maximise the task performance
everywhere else. We demonstrate DESTA’s ability to tackle challenging tasks and
compare against state-of-the-art RL methods in Safety Gym Benchmarks which
simulate real-world physical systems and OpenAI’s Lunar Lander.
5 Replies
Loading