Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization

TMLR Paper4774 Authors

02 May 2025 (modified: 18 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Risk sensitivity is pivotal in shaping agents' behavior when navigating uncertainty and diverging from risk-neutral scenarios. Risk measures such as Value at Risk (VaR) and Conditional Value at Risk (CVaR) have shown promising results in risk-sensitive reinforcement learning. In this paper, we study the incorporation of a relatively new coherent risk measure, Entropic Value at Risk (EVaR), as the objective the agent seeks to optimize. We propose a multi-timescale stochastic approximation algorithm to seek the optimal parameterized EVaR policy. Our algorithm facilitates effective exploration of the policy space and robust approximation of the gradient, leading to the optimization of the EVaR objective. We analyze the asymptotic behavior of our proposed algorithm and rigorously evaluate it across various discrete and continuous benchmark environments. The results highlight that the EVaR policy achieves higher cumulative returns and corroborate that EVaR is indeed a competitive risk-seeking objective for RL.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Goran_Radanovic1
Submission Number: 4774
Loading