Prescribed Safety Performance Imitation Learning from A Single Expert DatasetDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: safe imitation learning, inverse reinforcement learning
Abstract: Existing safe imitation learning (safe IL) methods mainly focus on learning safe policies that are similar to expert ones, but may fail in applications requiring different safety constraints. In this paper, we propose the Lagrangian Generative Adversarial Imitation Learning (LGAIL) algorithm, which can adaptively learn safe policies from a single expert dataset under diverse prescribed safety constraints. To achieve this, we augment GAIL with safety constraints and then relax it as an unconstrained optimization problem by utilizing a Lagrange multiplier. The Lagrange multiplier enables explicit consideration of the safety and is dynamically adjusted to balance the imitation and safety performance during training. Then, we apply a two-stage optimization framework to solve LGAIL: (1) a discriminator is optimized to measure the similarity between the agent-generated data and the expert ones; (2) forward reinforcement learning is employed to improve the similarity while considering safety concerns enabled by a Lagrange multiplier. Furthermore, theoretical analyses on the convergence and safety of LGAIL demonstrate its capability of adaptively learning a safe policy given prescribed safety constraints. At last, extensive experiments in OpenAI Safety Gym conclude the effectiveness of our approach.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
TL;DR: We propose a method to conduct safe imitation learning with prescribed safety performance.
5 Replies