Dynamically Augmented CVaR for MDPs and Uncertainty Quantifications for Robust MDPs Characterizing Risk

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Conditional Value-at-Risk, Robust Markov Decision Process, sequential optimization, optimal policy, finite and infinite horizon
TL;DR: The paper introduces dynamically augmented CVaR risk measure for Markov Decision Processes having advantages compared to other known definitions of CVaR for MDPs
Abstract: CVaR optimization is an important topic, and there are additional complications for defining and optimizing CVaR for sequential decision-making. This paper investigates the relation between CVaR optimization for an MDP with total discounted costs and a specially constructed Robust MDPs (RMDPs). This RMDP was introduced to the literature 10 years ago. It was proposed for efficient calculations of optimal CVaR values and is broadly used for this purpose. About two years ago it was understood that these calculations lead to lower bounds of the minimal static CVaR. This paper provides additional links between static CVaR and the RMDP. It shows that the optimal value of static CVaR is another characteristic for the RMDP rather than its value. Based on this understanding, this paper introduces the Dynamically augmented CVaR (DCVaR) risk measure, which is more natural than static CVaR. Unlike static CVaR, DCVaR does not sufferer from time inconsistency. In addition, DCVaR’s value function is equal to the value function of the RMDP, and it can be efficiently computed by value iterations. Optimal policies minimizing DCVaR exist, and they can be computed efficiently by the algorithm proposed three years ago. DCVaR has certain similarities with nested CVaR, and DCVaR can be viewed as a flexible version of nested CVaR in which tail risk levels can be adjusted depending on achieved gains or losses.
Submission Number: 185
Loading