- Keywords: Imitation learning, inverse reinforcement learning, noisy demonstrations
- TL;DR: We propose an imitation learning method to learn from diverse-quality demonstrations collected by demonstrators with different level of expertise.
- Abstract: The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL paradigm called Variational Imitation Learning with Diverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive estimation approach is not suitable to large state and action spaces, and fix this issue by using a variational approach that can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.
- Code: https://www.dropbox.com/sh/jrp87a1aey8jplq/AACh1cFj9ce8tZnqLR9iKq7Ea?dl=0