Keywords: offline imitation learning, imperfect demonstration, imitation learning
TL;DR: This paper introduces a simple yet effective data selection method along with a lightweight policy learning algorithm to fully exploit imperfect demonstrations in offline imitation learning.
Abstract: Offline imitation learning (IL) with imperfect data has garnered increasing attention due to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract good behaviors from noisy demonstrations. In general, current approaches to the problem build upon state-action similarity to the expert, neglecting the valuable information in (potentially abundant) diverse behaviors that deviate from given expert demonstrations. In this paper, we introduce a simple yet effective data selection method that identifies the positive behavior based on its "resultant state", which is a more informative criterion that enables explicit utilization of dynamics information and the extraction of both expert-like and beneficial diverse behaviors. Further, we devise a lightweight constrained behavior cloning algorithm capable of leveraging the expert and selected data correctly. We term our proposed method iLID and evaluate it on a suite of complex and high-dimensional offline IL benchmarks, including MuJoCo and Adroit tasks. The results demonstrate that iLID achieves state-of-the-art performance, significantly outperforming existing methods often by 2-5x while maintaining a comparable runtime to behavior cloning (BC).
Supplementary Material: zip
Submission Number: 4676
Loading