
\textbf{Related Work \quad}
% \begin{itemize}
%     \item The problem of probabilistic reachability is a form of Stochastic Shortest Path problem.
%     \item Goal POMDPs: Goal HSVI, Almost-sure reachability: Chatterjee
%     \item POMDP Probabilistic Reachability
% \end{itemize}
Algorithms to solve POMDPs with infinite horizon discounted properties have been extensively studied in the literature \citep{Lauri2023POMDPRobotics, shani2013survey}. 
A major bottleneck in those methods is the curse of dimensionality and history. 
To alleviate it, point-based methods such as Perseus \citep{spaan2005perseus}, HSVI2 \citep{Smith2005HSVI2}, SARSOP\citep{Kurniawati-RSS08-SARSOP}, and PLEASE \citep{zhang2015please} approximate the value function by incrementally exploring the space of reachable beliefs. These algorithms have been shown to be effective for moderately large discounted-sum POMDPs. They have been applied to the MRPP and POMDPs with LTL specifications~\citep{Bouton2020PointBasedModelChecking, kalagarla22a, yu2023trust}, but their theoretical properties only hold for discounted versions of such problems. In this work, we study the drawbacks of point based methods for MRPP, and propose an algorithm based on them to overcome these drawbacks and provide theoretical soundness.

It has been shown that MRPP is a special type of Stochastic Shortest Path Problem (SSP) with a specific non-negative reward structure~\citep{de1998formal}. 
\citet{Horak2018GoalHSVI} introduces a similar problem called Goal-POMDP, which is is an SSP with only positive costs and a set of goal states.  The objective of Goal-POMDP is to minimize the expected total cost until the goal set is reached. Similar to this work, \citet{Horak2018GoalHSVI} proposes extensions of HSVI2 to solve Goal-POMDPs. However, Goal-POMDP is different from MRPP because the assumption of positive cost and that the goal state is reachable from every state cannot be applied to MRPP.  Hence, algorithms for Goal-POMDPs cannot directly be used to solve MRPP.

The works closest to ours are belief-based approaches \citep{norman2017verification, Bork2022underapproximating, Bork2020overapp}. To the best of our knowledge, the only method which computes two-sided bounds with convergence guarantees is PRISM \citet{norman2017verification}. However, the approach is not scalable for larger POMDP problems, computing loose bounds in practice. \citet{Bork2022underapproximating} computes under-approximations by expanding beliefs in a breadth-first search manner, and adding them to a constructed belief MDP according to some heuristic. Beliefs not added are \emph{cut-off}, and values from a pre-computed policy are used from cut-off beliefs. In a similar manner, \citet{Bork2020overapp} compute over-approximations using breadth-first belief exploration and cut-offs. However, their technique relies heavily on good pre-computed policies, and requires searching a large part of the belief space to obtain good policies. On the contrary, our algorithm performs heuristic  trials in a depth-first manner using two-sided bounds, directing the search more efficiently.

An orthogonal approach to MRPP on POMDPs is to directly compute policies as Finite State Controllers (FSCs) \citep{Andriushchenko2022InductivePOMDP}. \citet{Andriushchenko2022InductivePOMDP} uses inductive synthesis to search for FSCs. They can find good small-memory policies relatively quickly, but they suffer when they require memory. \citet{andriushchenko2023symbiotic} proposes an approach which integrates inductive synthesis with belief-based approaches to extract the strengths of both techniques. Generally, these methods compute under-approximations in an anytime manner, and cannot detect when or if a near-optimal policy has been found. Our method, on the other hand, computes both lower and upper bounds, providing sub-optimality bounds and means for near-optimal policy guarantees.