Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In domains such as finance, healthcare, and robotics, managing worst-case scenarios is critical, as failure to do so can lead to catastrophic outcomes. Distributional Reinforcement Learning (DRL) provides a natural framework to incorporate risk sensitivity into decision-making processes. However, existing approaches face two key limitations: (1) the use of fixed risk measures at each decision step often results in overly conservative policies, and (2) the interpretation and theoretical properties of the learned policies remain unclear. While optimizing a static risk measure addresses these issues, its use in the DRL framework has been limited to the simple static CVaR risk measure. In this paper, we present a novel DRL algorithm with convergence guarantees that optimizes for a broader class of static Spectral Risk Measures (SRM). Additionally, we provide a clear interpretation of the learned policy by leveraging the distribution of returns in DRL and the decomposition of static coherent risk measures. Extensive experiments demonstrate that our model learns policies aligned with the SRM objective, and outperforms existing risk-neutral and risk-sensitive DRL models in various settings.
Lay Summary: In critical fields like finance and healthcare, AI systems must be able to handle worst-case scenarios to prevent disasters. Existing methods for managing risk often become overly cautious, reducing their effectiveness, or produce decisions that are hard to interpret. Meanwhile, alternative approaches that avoid these issues tend to be overly simplistic. To address these challenges, we developed a new learning algorithm that adopts a broader and more flexible definition of risk, moving beyond simpler strategies. Importantly, our method also offers a clear interpretation of its decision-making process and underlying risk preferences. Extensive testing shows that our approach not only aligns with the intended risk objectives but also outperforms existing risk-neutral and risk-sensitive methods across a range of scenarios. This work contributes to the development of more intelligent systems that manage risk more effectively and make safer, more reliable decisions in real-world applications.
Link To Code: https://github.com/MehrdadMoghimi/QRSRM
Primary Area: Reinforcement Learning->Online
Keywords: Reinforcement Learning, Distributional Reinforcement Learning, Risk Aversion, Spectral Risk Measures, Time-Consistency
Submission Number: 7314
Loading