Keywords: Deep Reinforcement Learning, Differentiable Formal Specification Language, Robot Navigation, Robot Planning and Control
TL;DR: This paper uses differentiable formal specifications to constrain the policy updates in hierarchical deep reinforcement learning.
Abstract: Formal logic specifications are a useful tool to describe desired agent behavior and have been explored as a means to shape rewards in Deep Reinforcement Learning (DRL) systems over a variety of problems and domains. Prior work, however, has failed to consider the possibility of making these specifications differentiable, which would yield a more informative signal of the objective via the specification gradient. This paper examines precisely such an approach by exploring a Lagrangian method to constrain policy updates using a differentiable style of temporal logic specifications that associates logic formulae with real-valued quantitative semantics. This constrained learning mechanism is then used in a hierarchical setting where a high-level specification-guided neural network path planner works with a low-level control policy to navigate through planned waypoints. The effectiveness of our approach is demonstrated over four robot dynamics with five different types of Linear Temporal Logic (LTL) specifications. Our demo videos are collected at https://sites.google.com/view/schrl.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)