Mean-Variance Optimization in Markov Decision ProcessesDownload PDF

2011 (modified: 16 Jul 2019)ICML 2011Readers: Everyone
Abstract: We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopoly-nomial exact and approximation algorithms.
0 Replies

Loading