Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

Published: 27 Apr 2023, Last Modified: 09 Jul 2023PRLEveryoneRevisionsBibTeX
Keywords: Safe RL, specification-guided RL, human-in-the-loop RL
TL;DR: This paper proposes a framework that jointly learns safety constraints and optimal RL policies in environments with unknown or undefined safety constraints.
Abstract: In many real-world applications, safety constraints for reinforcement learning (RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns safety constraints and optimal RL policies in such environments. Our approach merges a logically-constrained RL algorithm with an evolutionary algorithm to synthesize signal temporal logic (STL) specifications. We showcased our framework in grid-world environments, successfully identifying both acceptable safety constraints and RL policies.
Submission Number: 12
Loading