Keywords: Markov Decision Process
TL;DR: We propose a new general MDP framework and its algorithm which addresses previously unresolved MDP problems.
Abstract: We propose the Composite Robust Markov Decision Process (CompRMDP), a simple framework that unifies a wide range of decision-making problems, including the robust MDP, convex MDP, multi-discount constrained MDP (MD-CMDP), and their combinations. While the CompRMDP objective is non-convex, we prove that, under a mild coverage assumption, such as a full support in the initial distribution, a simple subgradient descent method finds its $\varepsilon$-optimal policy in $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ updates. Furthermore, we introduce a simple technique for ensuring the coverage assumption by perturbing the initial state distribution, while preserving the near-optimality of the resulting policy. This single algorithm solves all the captured settings, including MD-CMDP, which is a long-standing open problem since Feinberg (2000).
Submission Number: 13
Loading