Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Published: 17 Nov 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
Certifications: Survey Certification
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In summary, we made these changes for the camera-ready version: - We split the “soft constraints” paragraph in Sec. 1.1. into two paragraphs and added the suggested line of literature where temporal logic is used to systematically synthesize the reward. - Related to the editor’s previous suggestion that we should discuss the impact of the three provably safe RL classes on exploration strategies, we now gathered the previously scattered discussion on this in a new section in the conceptual analysis part of the paper: Sec. 2.4. This section discusses how the three classes change the probability density function of the action output. We added Figure 2 to visualize this. - Based on the editor’s questions, we decided to have a closer look at the “safety of a system” paragraph again. This resulted in improving the definitions and notation, clearly separating the definition from the information that provides a graspable intuition for the definitions, and adapting the assumption to a proposition. - We added the link to our reproducible CodeOcean capsule in the paper and improved Sec. 4.1. and Sec. 4.2. by more clearly differentiating when we use the continuous and the discrete system. - We corrected minor grammatical errors, spelling mistakes, and improved the notation.
Code: https://doi.org/10.24433/CO.9209121.v1
Supplementary Material: zip
Assigned Action Editor: ~Florian_Shkurti1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1357
Loading