Shielding Regular Safety Properties in Reinforcement Learning

15 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe Reinforcement Learning, Model Checking, Shielding
Abstract: To deploy reinforcement learning (RL) systems in real-world scenarios we need to consider requirements such as safety and constraint compliance, rather than blindly maximizing for reward. In this paper we study RL with regular safety properties. We present a constrained problem based on the satisfaction of regular safety properties with high probability and we compare our setup to the some common constrained Markov decision processes (CMDP) settings. We also present a meta-algorithm with provable safety-guarantees, that can be used to shield the agent from violating the regular safety property during training and deployment. We demonstrate the effectiveness and scalability of our framework by evaluating our meta-algorithm in both the tabular and deep RL setting.
Supplementary Material: zip
Primary Area: Reinforcement learning
Submission Number: 15393
Loading