Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Partially Observable Markov Decision Processes, Planning under Uncertainty, Probabilistic Model Checking, Heuristic Search, Point Based Methods
TL;DR: A novel heuristic search algorithm for POMDPs with Maximal Reachability Probability objectives.
Abstract: Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.
List Of Authors: Ho, Qi Heng and Feather, Martin and Rossi, Federico and Sunberg, Zachary and Lahijanian, Morteza
Latex Source Code: zip
Signed License Agreement: pdf
Code Url: https://github.com/aria-systems-group/HSVI-RP
Submission Number: 746
Loading