\subsection*{Theoretical Analysis}

Here, we analyze the theoretical properties of HSVI-RP without observation heuristic randomization; specifically its soundness and asymptotic convergence.

Although our algorithm is inspired by the trial-based HSVI2, its properties are different due to the indefinite horizon property of MRPP and the modifications made to HSVI-RP. The proof for $\epsilon$-convergence of HSVI2 relies on discounting to bound the required trial depths and number of iterations. Additionally, loops are not an issue due to discounting. On the other hand, HSVI-RP's asymptotic convergence of the lower bound for MRPP stems from our proposed graph representation, termination criteria, and trial-based expansion heuristics, which allow adequate exploration of the belief MDP.

\begin{lemma}[Soundness]
    \label{lemma:soundness}
    At any iteration of HSVI-RP, it holds for all $b_t \in B$ that $V^L(b_t)  \leq V^*(b_t) \leq V^U(b_t)$.
\end{lemma}

\begin{theorem}[Asymptotic Convergence]
    \label{theorem:convergence}
    Let action selection radius in Eq.~\eqref{eq:actionselection} be $\xi = 1$. Further, let $V^L_n$ denote the lower bound obtained from HSVI-RP after trial iteration $n$. Assume that there exists an optimal belief-based policy $\pi^*$ computable with finite memory. Then,
    \begin{align*}
        \lim_{n \rightarrow \infty} \left[P^{\pi^*}_{\mathcal{M}}(\lozenge \mathrm{T}) - V_n^L(b_0)\right]  = 0.
    \end{align*}
\end{theorem}
Proofs of Lemma~\ref{lemma:soundness} and Theorem~\ref{theorem:convergence} can be found in the Appendix.

In general, optimal belief-based policies for POMDPs with indefinite horizon require infinite memory, and the corresponding decision problem is undecidable 
\citep{MADANI2003Undecidability}. Therefore, the finite memory assumption is a practical requirement for any computed policy.



We leave the complete analysis of convergence of the upper bound to future work. It is possible to guarantee upper bound convergence in certain cases, e.g., when the POMDP induces a finite belief MDP. However, there are MRPPs in which the upper bound does not converge. These are MRPPs where lowering the upper bound requires an infinite number of explored beliefs, related to the undecidability result for indefinite horizon POMDPs. In our experimental evaluations, we found that the upper bound converges (quickly) in some problems but not in others.