Abstraction for Bayesian Reinforcement Learning in Factored POMDPs

Rolf A. N. Starre; Sammie Katt; Mustafa Mert Çelikok; Marco Loog; Frans A Oliehoek

Abstraction for Bayesian Reinforcement Learning in Factored POMDPs

Rolf A. N. Starre, Sammie Katt, Mustafa Mert Çelikok, Marco Loog, Frans A Oliehoek

Published: 21 Jul 2025, Last Modified: 21 Jul 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Bayesian reinforcement learning provides an elegant solution to addressing the exploration-exploitation trade-off in Partially Observable Markov Decision Processes (POMDPs) when the environment’s dynamics and reward function are initially unknown. By maintaining a belief over these unknown components and the state, the agent can effectively learn the environment’s dynamics and optimize their policy. However, scaling Bayesian reinforcement learning methods to large problems remains to be a significant challenge. While prior work has leveraged factored models and online sample-based planning to address this issue, these approaches often retain unnecessarily complex models and factors within the belief space that have minimal impact on the optimal policy. While this complexity might be necessary for accurate model learning, in reinforcement learning, the primary objective is not to recover the ground truth model but to optimize the policy for maximizing the expected sum of rewards. Abstraction offers a way to reduce model complexity by removing factors that are less relevant to achieving high rewards. In this work, we propose and analyze the integration of abstraction with online planning in factored POMDPs. Our empirical results demonstrate two key benefits. First, abstraction reduces model size, enabling faster simulations and thus more planning simulations within a fixed runtime. Second, abstraction enhances performance even with a fixed number of simulations due to greater statistical strength. These results underscore the potential of abstraction to improve both the scalability and effectiveness of Bayesian reinforcement learning in factored POMDPs.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Martha_White1

Submission Number: 4004

Loading