EvIL: Evolution Strategies for Generalisable Imitation Learning

Silvia Sapora; Chris Lu; Gokul Swamy; Yee Whye Teh; Jakob Nicolaus Foerster

EvIL: Evolution Strategies for Generalisable Imitation Learning

Silvia Sapora, Chris Lu, Gokul Swamy, Yee Whye Teh, Jakob Nicolaus Foerster

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement Learning, Inverse Reinforcement Learning, Imitation Learning, Evolutionary Strategies

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We present a general framework for Imitation Learning able to optimise non-differentiable objectives and generate transferable reward functions

Abstract: We present Evolutionary Imitation Learning (EvIL), a general approach to imitation learning (IL) able to predict agent behaviour across changing environment dynamics. In EvIL, we use Evolution Strategies to jointly meta-optimise the parameters (e.g. reward functions and dynamics) fed to an inner loop reinforcement learning procedure. In effect, this allows us to inherit some of the benefits of the inverse reinforcement learning approach to imitation learning while being significantly more flexible. Specifically, our algorithm can be applied with any policy optimisation method, without requiring the reward or training procedure to be differentiable. Our method succeeds at recovering a reward that induces expert-like behaviour across a variety of environments, even when the environment dynamics are not fully known. We test our method's effectiveness and generalisation capabilities in several tabular environments and continuous control settings and find that it outperforms both offline approaches, like behavioural cloning, and traditional inverse reinforcement learning techniques.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9179

Loading