Joint Learning of Temporal Logic Constraints and Policy in RL with Human FeedbacksDownload PDF

Duo XU

13 Jun 2022 (modified: 15 Aug 2022)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
Abstract: Deep reinforcement learning (RL) has achieved significant success in artificial domains and in some real-world applications. However, substantial challenges remain such as learning efficiently under safety constraints. Satisfying safety constraints is a hard requirement in many high-impact application domains. At a suitable level of abstraction, these constraints have rich temporal and logical structures, and can be expressed using formal languages like temporal logics. However, in previous papers, these constraints are assumed to be known, which may not be true in many practical scenarios. In this paper, we study safe RL under {\it unknown} temporal logical constraint and propose a joint learning framework for safety constraint and policies with human feedbacks. The proposed framework interleaves between two loops of learning safety constraint and logically-constrained RL. Specifically, in the outer loop, a new algorithm based on temporal logic neural network (TLNN) is proposed to learn the automaton of constraint formula with traces labeled by human feedbacks. In order to satisfy the safety constraint zero-shot, in the inner loop, we propose to use a pre-trained generalizable shield and a logically combined Q function for action selection. We evaluate the proposed framework over various environments and provide in-depth empirical analysis on performances of both automaton learning and safety guarantee, empirically verifying the advantages of our methods over previous ones.
0 Replies

Loading