Optimal Markov Policies for Finite-Horizon Constrained MDPs With Combined Additive and Multiplicative Utilities

Uday M. Kumar, Veeraruna Kavitha, Sanjay P. Bhat, Nandyala Hemachandra

Published: 01 Jan 2023, Last Modified: 06 May 2026IEEE Control Systems LettersEveryoneRevisionsCC BY-SA 4.0

Abstract: This letter considers the problem of optimizing a finite-horizon constrained Markov decision process (CMDP) where the objective and constraints are sums of additive and multiplicative utilities. To solve this, we construct another CMDP with only additive utilities whose optimal value over a restricted set of policies is equal to that of the original CMDP. Further, we provide a finite-dimensional bilinear program (BLP) whose value equals the CMDP value and whose solution provides the optimal policy. We also suggest an algorithm to solve the proposed BLP.

External IDs:doi:10.1109/lcsys.2023.3283470