One Demonstration Imitation Learning

Bradly C. Stadie; Siyan Zhao; Qiqi Xu; Bonnie Li; Lunjun Zhang

One Demonstration Imitation Learning

Bradly C. Stadie, Siyan Zhao, Qiqi Xu, Bonnie Li, Lunjun Zhang

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

Keywords: imitation learning, one shot imitation learning, reinforcement learning, exploration, representation learning

TL;DR: Unsupervised self-imitation algorithm capable of inference from a single expert demonstration.

Abstract: We develop a new algorithm for imitation learning from a single expert demonstration. In contrast to many previous one-shot imitation learning approaches, our algorithm does not assume access to more than one expert demonstration during the training phase. Instead, we leverage an exploration policy to acquire unsupervised trajectories, which are then used to train both an encoder and a context-aware imitation policy. The optimization procedures for the encoder, imitation learner, and exploration policy are all tightly linked. This linking creates a feedback loop wherein the exploration policy collects new demonstrations that challenge the imitation learner, while the encoder attempts to help the imitation policy to the best of its abilities. We evaluate our algorithm on 6 MujoCo robotics tasks.

Original Pdf: pdf

4 Replies

Loading