Mean-Variance Optimization in Markov Decision Processes

Shie Mannor, John N. Tsitsiklis

2011 (modified: 16 Jul 2019)ICML 2011Readers: Everyone

Abstract: We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopoly-nomial exact and approximation algorithms.

0 Replies