VILD: Variational Imitation Learning with Diverse-quality Demonstrations

Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
  • Keywords: Imitation learning, inverse reinforcement learning, noisy demonstrations
  • TL;DR: We propose an imitation learning method to learn from diverse-quality demonstrations collected by demonstrators with different level of expertise.
  • Abstract: The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL paradigm called Variational Imitation Learning with Diverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive estimation approach is not suitable to large state and action spaces, and fix this issue by using a variational approach that can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.
  • Code:
  • Original Pdf:  pdf
0 Replies