VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Voot Tangkaratt; Bo Han; Mohammad Emtiyaz Khan; Masashi Sugiyama

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Imitation learning, inverse reinforcement learning, noisy demonstrations

TL;DR: We propose an imitation learning method to learn from diverse-quality demonstrations collected by demonstrators with different level of expertise.

Abstract: The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL paradigm called Variational Imitation Learning with Diverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive estimation approach is not suitable to large state and action spaces, and fix this issue by using a variational approach that can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.

Code: https://www.dropbox.com/sh/jrp87a1aey8jplq/AACh1cFj9ce8tZnqLR9iKq7Ea?dl=0

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/vild-variational-imitation-learning-with/code)

Original Pdf: pdf

8 Replies

Loading