Keywords: Safe Reinforcement Learning, Safe Learning, Safe Exploration
Abstract: Safe reinforcement learning (RL) aims to maximize expected cumulative rewards while satisfying safety constraints, making it well-suited for safety-critical applications. In this paper, we address the setting where the safety of state-action pairs is unknown a priori, with the goal of learning an optimal policy while keeping the learning process as safe as possible. To this end, we propose a novel approach that guarantees almost-sure safety by progressively expanding an exploration set, leveraging previously verified safe state-action pairs and a predictive Gaussian Process (GP) model. We provide theoretical guarantees on asymptotic convergence to optimal policy and bound on the online regret. Numerical results on benchmark problems with both discrete and continuous state spaces show that our approach achieves superior safety during learning and effectively converges to optimal policies.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 16163
Loading