Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Chenyang Wu; Tianci Li; Zongzhang Zhang; Yang Yu

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Chenyang Wu, Tianci Li, Zongzhang Zhang, Yang Yu

Published: 31 Oct 2022, Last Modified: 14 Oct 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: Model-based Reinforcement Learning, Exploration and Exploitation, Optimism in the Face of Uncertainty

TL;DR: This paper proposes a provably-efficient general-purpose model-based reinforcement learning algorithm.

Abstract: Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable. This paper proposes an algorithm, Bayesian optimistic optimization (BOO), which adopts a dynamic weighting technique for enforcing the constraint rather than explicitly solving a constrained optimization problem. BOO is a general algorithm proved to be sample-efficient for models in a finite-dimensional reproducing kernel Hilbert space. We also develop techniques for effective optimization and show through some simulation experiments that BOO is competitive with the existing algorithms.

Supplementary Material: zip

11 Replies

Loading