Understanding the Relation Between Maximum-Entropy Inverse Reinforcement Learning and Behaviour CloningDownload PDF

27 Mar 2019, 19:59 (modified: 11 Jul 2022, 20:40)DeepGenStruct 2019Readers: Everyone
Keywords: Inverse Reinforcement Learning, Behaviour Cloning, f-divergence, distribution matching
TL;DR: Distribution matching through divergence minimization provides a common ground for comparing adversarial Maximum-Entropy Inverse Reinforcement Learning methods to Behaviour Cloning.
Abstract: In many settings, it is desirable to learn decision-making and control policies through learning or from expert demonstrations. The most common approaches under this framework are Behaviour Cloning (BC), and Inverse Reinforcement Learning (IRL). Recent methods for IRL have demonstrated the capacity to learn effective policies with access to a very limited set of demonstrations, a scenario in which BC methods often fail. Unfortunately, directly comparing the algorithms for these methods does not provide adequate intuition for understanding this difference in performance. This is the motivating factor for our work. We begin by presenting $f$-MAX, a generalization of AIRL (Fu et al., 2018), a state-of-the-art IRL method. $f$-MAX provides grounds for more directly comparing the objectives for LfD. We demonstrate that $f$-MAX, and by inheritance AIRL, is a subset of the cost-regularized IRL framework laid out by Ho & Ermon (2016). We conclude by empirically evaluating the factors of difference between various LfD objectives in the continuous control domain.
3 Replies