On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Tim G. J. Rudner; Cong Lu; Michael Osborne; Yarin Gal; Yee Whye Teh

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Tim G. J. Rudner, Cong Lu, Michael Osborne, Yarin Gal, Yee Whye Teh

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning, Expert Demonstrations, Behavioral Cloning, Gaussian processes, Uncertainty Quantification

Abstract: KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: This paper identifies pathological behavior in KL-regularized reinforcement learning from expert demonstrations and shows that fixing this pathology leads to a significant improvement in performance on manipulation and locomotion tasks.

Supplementary Material: pdf

Code: https://sites.google.com/view/nppac

14 Replies

Loading