Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

Chia-Cheng Chiang; Chien Feng; Li-Cheng Lan; Wei-Fang Sun; Cho-Jui Hsieh; Chun-Yi Lee

Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

Chia-Cheng Chiang, Chien Feng, Li-Cheng Lan, Wei-Fang Sun, Cho-Jui Hsieh, Chun-Yi Lee

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Imitation learning, reinforcement learning, single demonstration imitation learning.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This study presents TDIL, a method designed for single-demonstration imitation learning.

Abstract: This study investigates the challenging single-demonstration imitation learning (IL) setting. In this context, the learning agent relies solely on a single expert demonstration and operates in an environment that lacks external reward signals, human feedback, or prior analogous knowledge, as obtaining multiple demonstrations or engineering complex reward functions is often infeasible. Given these constraints, the study introduces a methodology termed Transition Discriminator-based IL (TDIL). TDIL aims to augment the density of available reward signals and enhance agent performance by incorporating environmental dynamics. It posits that rather than strictly adhering to a limited expert demonstration, the agent should first aim to reach states proximal to expert behavior. The study introduces a surrogate reward function, approximated by a transition discriminator, to facilitate this process. TDIL demonstrates promise in addressing the sparse-reward problem common in single-demonstration IL, and stabilizing the learning process of the agent during training. A comprehensive set of experiments across multiple benchmarks validate the effectiveness of TDIL over existing IL methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 997

Loading