EvIL: Evolution Strategies for Generalisable Imitation Learning

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learning, Inverse Reinforcement Learning, Imitation Learning, Evolutionary Strategies
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We present a general framework for Imitation Learning able to optimise non-differentiable objectives and generate transferable reward functions
Abstract: We present Evolutionary Imitation Learning (EvIL), a general approach to imitation learning (IL) able to predict agent behaviour across changing environment dynamics. In EvIL, we use Evolution Strategies to jointly meta-optimise the parameters (e.g. reward functions and dynamics) fed to an inner loop reinforcement learning procedure. In effect, this allows us to inherit some of the benefits of the inverse reinforcement learning approach to imitation learning while being significantly more flexible. Specifically, our algorithm can be applied with any policy optimisation method, without requiring the reward or training procedure to be differentiable. Our method succeeds at recovering a reward that induces expert-like behaviour across a variety of environments, even when the environment dynamics are not fully known. We test our method's effectiveness and generalisation capabilities in several tabular environments and continuous control settings and find that it outperforms both offline approaches, like behavioural cloning, and traditional inverse reinforcement learning techniques.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9179
Loading