\section{Conclusion and Future Work}

We introduce and analyze the \emph{stochastic self-destruction} behavior of C-POMDP policies, and show C-POMDPs may not exhibit optimal substructure. We propose a new formulation, RC-POMDPs, and present an algorithm for RC-POMDPs. Results show that C-POMDP policies exhibit unintuitive behavior not present in RC-POMDP policies, and our algorithm effectively computes policies for RC-POMDPs. We believe RC-POMDPs are an alternate formulation that can be more desirable for some applications.

For future work, we plan to study study what models exhibit strong cases of stochastic self-destruction, and develop more metrics that signal stochastic self-destruction. Additionally, we plan to analyze classes (or conditions) of RC-POMDPs that are approximable and designing algorithms that converge for such cases.

Further, our offline policy tree search algorithm can benefit from better policy search heuristics and more efficient policy representations (e.g. finite state controllers). We also plan to explore other approaches, such as searching for finite state controllers directly \citep{Wray2022pga} and online tree search approximations \citep{Lee2018ccpomcp}.

Finally, we have shown that RC-POMDPs can provide more desirable policies than C-POMDPs, but the cost constraints remain on expectation. For some applications, probabilistic or risk measure constraints may be more desirable than expectation constraints. These formulations also benefit from the recursive constraints that we propose for RC-POMDPs.