Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar; Denis Yarats; Lerrel Pinto

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar, Denis Yarats, Lerrel Pinto

Published: 23 Jun 2022, Last Modified: 03 Nov 2024L-DOD 2022 PosterReaders: Everyone

Keywords: Imitation Learning, Continuous Control

TL;DR: We propose a new imitation learning algorithm that substantially improves sample efficiency for continuous control problems.

Abstract: Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions particularly for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based state-matching. Our key technical insight is that adaptively combining state-matching rewards with behavior cloning can significantly accelerate imitation even without task-specific rewards. Our experiments on 19 tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark, demonstrate an average of 7.8x faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/watch-and-match-supercharging-imitation-with/code)

0 Replies

Loading