Safer Reinforcement Learning by Going Off-policy: a Benchmark

Published: 28 Jun 2024, Last Modified: 25 Jul 2024NextGenAISafety 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe Reinforcement Learning, Off-policy, Continuous Control
TL;DR: A benchmark of off-policy reinforcement learning algorithms applied to SafeRL domain.
Abstract: Avoiding penalizing safety constraints while learning solvable tasks is the main concern of Safe Reinforcement Learning (SafeRL). Most prior studies focus on solving SafeRL problems with the on-policy algorithms, which obtain stable results at the expense of sample efficiency. In this paper, we study SafeRL from the off-policy perspective. We argue that off-policy RL algorithms are better suited for SafeRL as minimizing the number of samples results in fewer safety penalties. We show that off-policy algorithms achieve better safety metrics for the same performance level than on-policy competitors and provide a benchmark of 6 modern off-policy algorithms tested on 30 environments from the state-of-the-art SafetyGymnasium environment set.
Submission Number: 69
Loading