Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees

ICLR 2026 Conference Submission14736 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe Reinforcement Learning, Adaptive Shielding, Hidden Parameters, Constrained Markov Decision Process, Conformal Prediction
Abstract: Unseen shifts in environment dynamics, driven by hidden parameters such as friction or gravity, can trigger safety risks during deployment. We develop a runtime shielding mechanism for reinforcement learning, building on the formalism of constrained hidden-parameter Markov decision processes. Function encoders enable real-time inference of hidden parameters from observations, allowing the shield and the underlying policy to adapt online. To further promote safe policy learning, we introduce a safety-regularized objective that augments reward maximization with a bounded safety measure. This objective encourages the selection of actions that minimize long-term safety violations. The shield constrains the action space by forecasting future safety risks (such as obstacle proximity) and accounts for uncertainty via conformal prediction. We prove that the proposed mechanism satisfies probabilistic safety guarantees and yields optimal policies within safety-compliant policies. Experiments across diverse environments with varying hidden parameters show that our approach reduces safety violations while maintaining effective task-solving performance, and achieving robust out-of-distribution generalization.
Primary Area: reinforcement learning
Submission Number: 14736
Loading