Understanding the Relation Between Maximum-Entropy Inverse Reinforcement Learning and Behaviour Cloning

Seyed Kamyar Seyed Ghasemipour; Shane Gu; Richard Zemel

Understanding the Relation Between Maximum-Entropy Inverse Reinforcement Learning and Behaviour Cloning

Seyed Kamyar Seyed Ghasemipour, Shane Gu, Richard Zemel

Published: 03 May 2019, Last Modified: 05 May 2023DeepGenStruct 2019Readers: Everyone

Keywords: Inverse Reinforcement Learning, Behaviour Cloning, f-divergence, distribution matching

TL;DR: Distribution matching through divergence minimization provides a common ground for comparing adversarial Maximum-Entropy Inverse Reinforcement Learning methods to Behaviour Cloning.

Abstract: In many settings, it is desirable to learn decision-making and control policies through learning or from expert demonstrations. The most common approaches under this framework are Behaviour Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, directly comparing the algorithms for these methods does not provide adequate intuition for understanding this difference in performance. This is the motivating factor for our work. We begin by presenting $f$-MAX, a generalization of AIRL (Fu et al., 2018), a state-of-the-art IRL method. $f$-MAX provides grounds for more directly comparing the objectives for LfD. We demonstrate that $f$-MAX, and by inheritance AIRL, is a subset of the cost-regularized IRL framework laid out by Ho & Ermon (2016). We conclude by empirically evaluating the factors of difference between various LfD objectives in the continuous control domain.

3 Replies

Loading