Abstract: In the existing work on risk-sensitive reinforcement learning (RL) problems, in order to take uncertainty into consideration, risk measure such as conditional value-at-risk (CVaR) has been widely used to design robust RL algorithms. However, the uncertainty set in the dual representation of CVaR is defined by distributions whose Radon-Nikodym derivative is constrained to a certain range. This is a less common way to de-fine distribution neighborhood in machine learning applications. This paper applies a recently developed risk measure named entropic value-at-risk (EVaR) to risk-sensitive RL problems. One appealing feature of EVaR is that the uncertainty set in its dual representation is defined by distributions whose Kullback-Leibler (KL) distance to the nominal distribution is less or equal to a certain level. In this paper, we address the EVaR optimization problem based on Markov decision process (MDP) by proposing a value iteration algorithm. Numerical examples are also provided to illustrate the practicality of our approach.
0 Replies
Loading