Converging and Stabilizing Generative Adversarial Imitation Learning

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: generative adversarial imitation learning, convergence, stability, control theory
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Generative adversarial imitation learning (GAIL) is a powerful framework for model-free imitation learning. GAIL extracts a policy from expert demonstrations by training the parameterized policy to fool a discriminator for the state-action pairs generated by the learned policy and experts. However, the training process of GAIL has oscillating behaviors, which spoils its performance and efficiency. In this paper, we study the stability of GAIL from the perspective of control theory. We formulate the training process of GAIL as a system of differential equations and formally prove that GAIL never approaches the desired equilibrium. We then leverage methodologies from control theory to design control functions that not only push GAIL to the desired equilibrium but also achieve asymptotic stability in theory. Motivated by the theoretical results, we propose a controlled GAIL algorithm with a modified learning objective for the discriminator. We evaluate our algorithm for MuJoCo tasks. While the vanilla GAIL is unstable and cannot acquire the expert return on some tasks, our controlled GAIL can approach expert returns on all the tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4867
Loading