Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples

Kevin Li; Abhishek Gupta; Vitchyr H. Pong; Ashwin Reddy; Aurick Zhou; Justin Yu; Sergey Levine

Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples

Kevin Li, Abhishek Gupta, Vitchyr H. Pong, Ashwin Reddy, Aurick Zhou, Justin Yu, Sergey Levine

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Goal Reaching, Bayesian Classification, Reward Inference

Abstract: Exploration in reinforcement learning is, in general, a challenging problem. In this work, we study a more tractable class of reinforcement learning problems defined by data that provides examples of successful outcome states. In this case, the reward function can be obtained automatically by training a classifier to classify states as successful or not. We argue that, with appropriate representation and regularization, such a classifier can guide a reinforcement learning algorithm to an effective solution. However, as we will show, this requires the classifier to make uncertainty-aware predictions that are very difficult with standard deep networks. To address this, we propose a novel mechanism for obtaining calibrated uncertainty based on an amortized technique for computing the normalized maximum likelihood distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions from data, while being able to guide algorithms towards the specified goal more effectively. We show how using amortized normalized maximum likelihood for reward inference is able to provide effective reward guidance for solving a number of challenging navigation and robotic manipulation tasks which prove difficult for other algorithms.

One-sentence Summary: Bayesian classifiers allow efficient reinforcement learning and reward inference from outcome examples

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=xdQNZScZRE

26 Replies

Loading