Keywords: Model-Based Reinforcement Learning, Active Learning, Gaussian Processes
Abstract: In this work, we study the problem of sample efficient exploration in Model-Based Reinforcement Learning (MBRL). While most popular exploration methods in MBRL are "reactive" in nature, and thus inherently sample inefficient, we discuss the benefits of an "active" approach, where the agent selects actions to query novel states in a data-efficient way, provided that one can guarantee that regions of high epistemic, and not aleatoric, uncertainty are targeted. In order to ensure this, we consider popular exploration bonuses based on Bayesian surprise, and demonstrate their desirable properties under the assumption of a Gaussian Process model. We then introduce a novel exploration method, Bayesian Active Exploration, where the agent queries transitions based on a multi-step predictive search aimed at maximizing the expected information gain. Moreover, we propose alternative dynamics model specifications based on stochastic variational Gaussian Processes and deep kernels that allow for better scalability with sample size and state-action spaces, and accommodate non-tabular inputs by learning a latent representation, while maintaining good uncertainty-quantification properties.
Supplementary Material: zip
Submission Number: 94
Loading