Keywords: Convergence, Policy Gradient, Robsut Markov Decision Process
TL;DR: Faster global convergence rate for robust policy gradient for s-rectangular robust MDP
Abstract: Recently, global convergence has been achieved for non-robust MDPs with an iteration complexity of $O(\frac{1}{\epsilon})$ for finding an $\epsilon$-optimal policy, for which PL condition derived from performance difference lemma has played a key role. This work extends performance difference lemma to \texttt{s}-rectangular robust MDPs from which PL condition can be derived. We further, present a simplified proof for the policy gradient convergence for non-robust case, which together with robust performance difference lemma, can lead to global convergence of robust policy gradient.
1 Reply
Loading