A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

Toshinori Kitamura; Arnob Ghosh; Tadashi Kozuno; Kenta Hoshino; Yohei Hosoe; Kazumi Kasaura; Wataru Kumagai; Paavo Parmas; Yutaka Matsuo

A Unified MDP Framework for Solving Robust, Convex, Multi-Discount Constraints, and Beyond

Toshinori Kitamura, Arnob Ghosh, Tadashi Kozuno, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Wataru Kumagai, Paavo Parmas, Yutaka Matsuo

Published: 01 Jul 2025, Last Modified: 21 Jul 2025Finding the Frame (RLC 2025)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Markov Decision Process

TL;DR: We propose a new general MDP framework and its algorithm which addresses previously unresolved MDP problems.

Abstract: We propose the Composite Robust Markov Decision Process (CompRMDP), a simple framework that unifies a wide range of decision-making problems, including the robust MDP, convex MDP, multi-discount constrained MDP (MD-CMDP), and their combinations. While the CompRMDP objective is non-convex, we prove that, under a mild coverage assumption, such as a full support in the initial distribution, a simple subgradient descent method finds its $\varepsilon$-optimal policy in $\widetilde{\mathcal{O}}(\varepsilon^{-4})$ updates. Furthermore, we introduce a simple technique for ensuring the coverage assumption by perturbing the initial state distribution, while preserving the near-optimality of the resulting policy. This single algorithm solves all the captured settings, including MD-CMDP, which is a long-standing open problem since Feinberg (2000).

Submission Number: 13

Loading