Chance-Constrained POMDP Planning with Learned Neural Network Surrogates

Published: 24 Jun 2024, Last Modified: 01 May 2025IJCAI TIDMwFM 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: CC-POMDPs, safe planning, Monte Carlo tree search, failure probability estimation
TL;DR: We introduce ConstrainedZero for CC-POMDPs that learns approximations of the value function, action-selection policy, and failure probability to replace heuristics in MCTS and adapts the target level of safety during search.
Abstract: To plan safely in uncertain environments, agents must balance utility with safety constraints. Safe planning problems can be modeled as a chance-constrained partially observable Markov decision process (CC-POMDP) and solutions often use expensive rollouts or heuristics to estimate the optimal value and action-selection policy. This work introduces the ConstrainedZero policy iteration algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy with an additional network head that estimates the failure probability given a belief. This failure probability guides safe action selection during online Monte Carlo tree search (MCTS). To avoid overemphasizing search based on the failure estimates, we introduce $\Delta$-MCTS, which uses adaptive conformal inference to update the failure threshold during planning. The approach is tested on a safety-critical POMDP benchmark, an aircraft collision avoidance system, and the sustainability problem of safe CO$_2$ storage. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
Submission Number: 6
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview