How to Leverage Imperfect Demonstrations in Offline Imitation Learning

Sheng Yue; Jiani Liu; Yongheng Deng; Ju Ren

How to Leverage Imperfect Demonstrations in Offline Imitation Learning

Sheng Yue, Jiani Liu, Yongheng Deng, Ju Ren

09 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX

Keywords: offline imitation learning, imperfect demonstration, imitation learning

TL;DR: This paper introduces a simple yet effective data selection method along with a lightweight policy learning algorithm to fully exploit imperfect demonstrations in offline imitation learning.

Abstract: Offline imitation learning (IL) with imperfect data has garnered increasing attention due to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract good behaviors from noisy demonstrations. In general, current approaches to the problem build upon state-action similarity to the expert, neglecting the valuable information in (potentially abundant) diverse behaviors that deviate from given expert demonstrations. In this paper, we introduce a simple yet effective data selection method that identifies the positive behavior based on its "resultant state", which is a more informative criterion that enables explicit utilization of dynamics information and the extraction of both expert-like and beneficial diverse behaviors. Further, we devise a lightweight constrained behavior cloning algorithm capable of leveraging the expert and selected data correctly. We term our proposed method iLID and evaluate it on a suite of complex and high-dimensional offline IL benchmarks, including MuJoCo and Adroit tasks. The results demonstrate that iLID achieves state-of-the-art performance, significantly outperforming existing methods often by 2-5x while maintaining a comparable runtime to behavior cloning (BC).

Supplementary Material: zip

Submission Number: 4676

Loading