Policy Gradient Based Entropic-VaR Optimization in Risk-Sensitive Reinforcement Learning

Xinyi Ni, Lifeng Lai

Published: 01 Jan 2022, Last Modified: 15 May 2023Allerton 2022Readers: Everyone

Abstract: In risk-sensitive reinforcement learning, it is important to develop algorithms based on risk measures that are conceptually meaningful and computationally tractable. In this paper, we apply a recently developed risk measure named entropic value-at-risk (EVaR) for the design of robust RL algorithms under the MDP framework. In particular, we develop policy gradient method to optimize the risk-sensitive criterion induced by EVaR. Towards this goal, we first derive the gradients of the EVaR involved objective function and then propose a trajectory-based policy gradient method to estimate gradients as well as update the policy until it converges to a local optimal policy. We prove the convergence of the policy by adopting the stochastic approximation approach. We also provide numerical results to illustrate the proposed algorithms.

0 Replies