Safe Learning Through Controlled Expansion of Exploration Set

Safe Learning Through Controlled Expansion of Exploration Set

ICLR 2026 Conference Submission16163 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safe Reinforcement Learning, Safe Learning, Safe Exploration

Abstract: Safe reinforcement learning (RL) aims to maximize expected cumulative rewards while satisfying safety constraints, making it well-suited for safety-critical applications. In this paper, we address the setting where the safety of state-action pairs is unknown a priori, with the goal of learning an optimal policy while keeping the learning process as safe as possible. To this end, we propose a novel approach that guarantees almost-sure safety by progressively expanding an exploration set, leveraging previously verified safe state-action pairs and a predictive Gaussian Process (GP) model. We provide theoretical guarantees on asymptotic convergence to optimal policy and bound on the online regret. Numerical results on benchmark problems with both discrete and continuous state spaces show that our approach achieves superior safety during learning and effectively converges to optimal policies.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 16163

Loading