Towards Safe Reinforcement Learning via Constraining Conditional Value at RiskDownload PDF

Published: 21 Jun 2021, Last Modified: 05 May 2023ICML 2021 Workshop AML PosterReaders: Everyone
Keywords: Reinforcement Learning, Risk Sensitive, Safety, Policy Optimization, Reward Evaluation
TL;DR: We use CVaR as a new measurement of risk and propose a safe RL algorithm CPPO, which achieves overall higher performance and stronger robustness theoretically and experimentally.
Abstract: Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty caused by stochastic policies and environment variability. To address this issue, we propose a novel reinforcement learning framework of CVaR-Proximal-Policy-Optimization (CPPO) by rating the conditional value-at-risk (CVaR) as an assessment for risk. We show that performance degradation under observation state disturbance and transition probability disturbance theoretically depends on the range of disturbance as well as the gap of value function between different states. Therefore, constraining the value function among states with CVaR can improve the robustness of the policy. Experimental results show that CPPO achieves higher cumulative reward and exhibits stronger robustness against observation state disturbance and transition probability disturbance in environment dynamics among a series of continuous control tasks in MuJoCo.
2 Replies

Loading