Learning state-variable relationships for improving POMCP performance

Maddalena Zuccotto, Alberto Castellini, Alessandro Farinelli

Published: 2022, Last Modified: 06 Mar 2025SAC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We address the problem of learning state-variable relationships across different episodes in Partially Observable Markov Decision Processes (POMDPs) to improve planning performance. Specifically, we focus on Partially Observable Monte Carlo Planning (POMCP) and we represent the acquired knowledge with Markov Random Fields (MRFs). We propose three different methods to compute MRF parameters while the agent acts in the environment. Our techniques acquire information from agent action outcomes, and from the belief of the agent, which summarizes the knowledge acquired from observations. We also propose a stopping criterion to determine when the MRF is accurate enough and the learning process can be stopped. Results show that the proposed approach allows to effectively learn state-variable probabilistic constraints and to outperform standard POMCP with no computational overhead.