Keywords: Imitation Learning, Continuous Control
TL;DR: We propose a new imitation learning algorithm that substantially improves sample efficiency for continuous control problems.
Abstract: Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions particularly for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based state-matching. Our key technical insight is that adaptively combining state-matching rewards with behavior cloning can significantly accelerate imitation even without task-specific rewards. Our experiments on 19 tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark, demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2206.15469/code)