Abstract: We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q- learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance ε > 0 for optimal Q-value estimation gap and learning rate k ∈ (1/2, 1], the overall convergence rate of our algorithm is Ω((ln(1/δε)/ε <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/k</sup> + (ln(1/ε)) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1/(1-k)</sup> ) with probability at least 1 - δ.
0 Replies
Loading