A Safe Exploration Strategy for Model-free Task Adaptation in Safety-constrained Grid Environments

Erfan Entezami, Mahsa Sahebdel, Dhawal Gupta

Published: 23 Oct 2023, Last Modified: 03 Feb 2025OpenReview Archive Direct UploadEveryoneCC BY-SA 4.0

Abstract: Training a model-free reinforcement learning agent re- quires allowing the agent to sufficiently explore the environ- ment to search for an optimal policy. In safety-constrained environments, utilizing unsupervised exploration or a non- optimal policy may lead the agent to undesirable states, re- sulting in outcomes that are potentially costly or hazardous for both the agent and the environment. In this paper, we introduce a new exploration framework for navigating the grid environments that enables model-free agents to interact with the environment while adhering to safety constraints. Our framework includes a pre-training phase, during which the agent learns to identify potentially unsafe states based on both observable features and specified safety constraints in the environment. Subsequently, a binary classification model is trained to predict those unsafe states in new envi- ronments that exhibit similar dynamics. This trained clas- sifier empowers model-free agents to determine situations in which employing random exploration or a suboptimal policy may pose safety risks, in which case our framework prompts the agent to follow a predefined safe policy to mit- igate the potential for hazardous consequences. We evalu- ated our framework on three randomly generated grid en- vironments and demonstrated how model-free agents can safely adapt to new tasks and learn optimal policies for new environments. Our results indicate that by defining an ap- propriate safe policy and utilizing a well-trained model to detect unsafe states, our framework enables a model-free agent to adapt to new tasks and environments with signifi- cantly fewer safety violations.