Belief Re-Use in Partially Observable Monte Carlo Tree Search

Published: 01 Jan 2025, Last Modified: 22 Jul 2025ICAART (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Partially observable Markov decision processes (POMDPs) require agents to make decisions with incomplete information, facing challenges like an exponential growth in belief states and action-observation histories. Monte Carlo tree search (MCTS) is commonly used for this, but it redundantly evaluates identical states reached through different action sequences. We propose Belief Re-use in Online Partially Observable Planning (BROPOP), a technique that transforms the MCTS tree into a graph by merging nodes with similar beliefs. Using a POMDP-specific locality-sensitive hashing method, BROPOP efficiently identifies and reuses belief nodes while preserving information integrity through update-descent backpropagation. Experiments on standard benchmarks show that BROPOP enhances reward performance with controlled computational cost.
Loading