On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes

TMLR Paper5973 Authors

23 Sept 2025 (modified: 29 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: It was recently shown that dynamic programming (DP) methods for finding static CVaR-optimal policies in Markov Decision Processes (MDPs) can fail when based on the dual formulation, yet the root cause of this failure remains unclear. We expand on these findings by shifting focus from policy optimization to the seemingly simpler task of policy evaluation. We show that evaluating the static CVaR of a given policy can be framed as two distinct minimization problems. We introduce a set of ``risk-assignment consistency constraints'' that must be satisfied for their solutions to match and we demonstrate that an empty intersection of these constraints is the source of previously observed evaluation errors. Quantifying the evaluation error as the \emph{CVaR evaluation gap}, we demonstrate that the issues observed when optimizing over the dual-based CVaR DP are explained by the returned policy having a non-zero CVaR evaluation gap. Finally, we leverage our proposed risk-assignment perspective to prove that the search for a single, uniformly optimal policy on the dual CVaR decomposition is fundamentally limited, identifying an MDP where no single policy can be optimal across all initial risk levels.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Steffen_Udluft1
Submission Number: 5973
Loading