Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding

Daniel Bethell, Simos Gerasimou, Radu Calinescu, Calum Imrie

Published: 2025, Last Modified: 26 Feb 2026ECAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Safe exploration of reinforcement learning (RL) agents is a critical activity for empowering their deployment in many real-world scenarios. When prior knowledge of the target domain or task is unavailable, training RL agents in unknown, black-box environments unavoidably yields significant safety risks. Our ADVICE (Adaptive Shielding with a Contrastive Autoencoder) novel post-shielding approach operates in continuous state and action spaces, distinguishing safe and unsafe features of state-action pairs during training, and uses this knowledge to safeguard the RL agent from executing actions that yield likely hazardous outcomes. Our comprehensive experimental evaluation shows that ADVICE significantly reduces safety violations (≈50%) compared to state-of-the-art safe RL exploration approaches, while maintaining a competitive outcome reward for the synthesised safe policy.

External IDs:dblp:conf/ecai/BethellGCI25