Efficient Inverse Reinforcement Learning without Compounding Errors

Nicolas Espinosa Dice; Gokul Swamy; Sanjiban Choudhury; Wen Sun

Efficient Inverse Reinforcement Learning without Compounding Errors

Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun

Published: 07 Aug 2024, Last Modified: 24 Aug 2024RLSW 2024 TalkPosterEveryoneRevisionsBibTeXCC BY 4.0

Confirmation: Yes

Keywords: inverse reinforcement learning, imitation learning, distribution shift, policy completeness

TL;DR: We present a novel structural condition under which IRL can be both efficient and avoid quadratically compounding errors.

Abstract: Inverse reinforcement learning (IRL) is an on-policy approach to imitation learning (IL) that allows the learner to observe the consequences of their actions at train-time. Accordingly, there are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. In our work, we first present a negative result showing that, without further assumptions, there are no efficient IRL algorithms that avoid compounding errors in the worst case. We then provide a positive result: under a novel structural condition we term reward-agnostic policy completeness, we prove that efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds. We then address a practical constraint---the case of limited expert data---and propose a principled method for using sub-optimal data to further improve the sample-efficiency of IRL algorithms.

Submission Number: 7

Loading