Conditional Risk-Averse Constrained Reinforcement Learning

05 Mar 2026 (modified: 30 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In Risk-averse Constrained Reinforcement Learning (RaCRL), the optimal tolerance for risk often depends on a preference over the trade-off between reward and safety. This trade-off is influenced by environmental uncertainty, which is generally difficult to quantify, in turn making its effect on an agent's performance difficult to predict at the outset of training. Conventional RaCRL approaches typically train agents under a fixed risk level, set at the beginning of training, leading to an agent with a fixed, often conservative, reward-safety trade-off at deployment time. In this paper, we introduce Conditional Risk-averse Actor Critic (CRAC), a novel algorithm for RaCRL that conditions the agent on risk levels sampled during both exploration and learning. Through exploring and learning from diverse experiences across varied risk levels, CRAC generalises effectively across a spectrum of risk preferences, enabling the deployment of a single agent at risk levels chosen by a user. We evaluate CRAC across a set of environments with increasing difficulty, demonstrating empirically that it generalises effectively across a risk spectrum. CRAC often achieves higher reward than fixed-risk agents, whilst satisfying cost constraints. In cases where CRAC's reward performance is marginally lower than a fixed-risk agent, CRAC retains the advantage of a single risk-conditioned policy that generalises to a risk spectrum, reducing training overhead and providing more control over the reward-cost trade-off.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~George_Trimponias2
Submission Number: 7779
Loading