Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

Dilip Arumugam; Satinder Singh

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

Dilip Arumugam, Satinder Singh

Published: 31 Oct 2022, Last Modified: 11 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Bayes-Adaptive Markov Decision Process, Bayesian reinforcement learning, Exploration, Planning

Abstract: The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

Supplementary Material: pdf

23 Replies

Loading