TL;DR: We develop a new planning algorithm that handles uncertainty by tracking probability distributions instead of single values, achieving 20-80% better performance in unpredictable environments.
Abstract: This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) tailored for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions, to introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $\alpha$-divergence, by drawing a crucial connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence of $\mathcal{O}(n^{-1/2})$, with $n$ as the number of visited trajectories, to the optimal policy and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.
Lay Summary: Imagine you're playing a complex game where the rules keep changing randomly, and you can't see all the information you need to make good decisions. Traditional computer algorithms that plan moves ahead (like those used in chess or Go) struggle in these uncertain situations because they assume the world is predictable and all information is available. Our research introduces a new planning algorithm called "Wasserstein MCTS" that's specifically designed to handle uncertainty and randomness. Instead of just tracking single "best guess" values for each possible move, our algorithm keeps track of entire probability distributions - essentially maintaining a range of possible outcomes and their likelihoods. The key innovation is how we combine information from different possible moves. We use a mathematical technique called "Wasserstein barycenters" (think of it as a sophisticated way of averaging probability distributions) that allows the algorithm to properly account for uncertainty when deciding which moves to explore. This is like having a chess player who not only considers the most likely outcome of each move, but also weighs the full range of possible results and their uncertainties. We tested our algorithm on various challenging scenarios - from navigating slippery frozen lakes where movements are unpredictable, to complex maze-like environments where important information is hidden. In these tests, our approach consistently outperformed existing methods, often by substantial margins (20-80% improvement in many cases). This research has practical implications for real-world applications like robot navigation in unpredictable environments, autonomous vehicle planning in uncertain traffic conditions, and resource management where outcomes depend on many random factors. By better handling uncertainty, our algorithm makes more robust decisions in situations where traditional planning methods might fail.
Primary Area: Reinforcement Learning->Planning
Keywords: Monte-Carlo Tree Search, Planning under Uncertainty
Submission Number: 15359
Loading