The Whys and Hows of Active Exploration in Model-Based Reinforcement Learning

Alberto Caron; Chris Hicks; Vasilios Mavroudis

The Whys and Hows of Active Exploration in Model-Based Reinforcement Learning

Alberto Caron, Chris Hicks, Vasilios Mavroudis

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model-Based Reinforcement Learning, Active Learning, Gaussian Processes

Abstract: In this work, we study the problem of sample efficient exploration in Model-Based Reinforcement Learning (MBRL). While most popular exploration methods in MBRL are "reactive" in nature, and thus inherently sample inefficient, we discuss the benefits of an "active" approach, where the agent selects actions to query novel states in a data-efficient way, provided that one can guarantee that regions of high epistemic, and not aleatoric, uncertainty are targeted. In order to ensure this, we consider popular exploration bonuses based on Bayesian surprise, and demonstrate their desirable properties under the assumption of a Gaussian Process model. We then introduce a novel exploration method, Bayesian Active Exploration, where the agent queries transitions based on a multi-step predictive search aimed at maximizing the expected information gain. Moreover, we propose alternative dynamics model specifications based on stochastic variational Gaussian Processes and deep kernels that allow for better scalability with sample size and state-action spaces, and accommodate non-tabular inputs by learning a latent representation, while maintaining good uncertainty-quantification properties.

Supplementary Material: zip

Submission Number: 94

Loading