LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Firas Al-Hafez; Davide Tateo; Oleg Arenz; Guoping Zhao; Jan Peters

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Firas Al-Hafez, Davide Tateo, Oleg Arenz, Guoping Zhao, Jan Peters

Published: 01 Feb 2023, Last Modified: 12 Mar 2024ICLR 2023 posterReaders: Everyone

Keywords: Inverse Reinforcement Learning, Imitation Learning, Reward Regularization, Deep Reinforcement Learning

TL;DR: We propose a novel perspective on implicit L2 reward regularization for inverse reinforcement learning.

Abstract: Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function. However, these methods generally require implicit reward regularization to improve stability and often mistreat absorbing states. Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms. In this work, we show that using this regularizer under a mixture distribution of the policy and the expert provides a particularly illuminating perspective: the original objective can be understood as squared Bellman error minimization, and the corresponding optimization problem minimizes a bounded $\chi^2$-Divergence between the expert and the mixture distribution. This perspective allows us to address instabilities and properly treat absorbing states. We show that our method, Least Squares Inverse Q-Learning (LS-IQ), outperforms state-of-the-art algorithms, particularly in environments with absorbing states. Finally, we propose to use an inverse dynamics model to learn from observations only. Using this approach, we retain performance in settings where no expert actions are available.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2303.00599/code)

11 Replies

Loading