Towards Faster Global Convergence of Robust Policy Gradient Methods

Navdeep Kumar; Ilnura Usmanova; Kfir Yehuda Levy; Shie Mannor

Towards Faster Global Convergence of Robust Policy Gradient Methods

Navdeep Kumar, Ilnura Usmanova, Kfir Yehuda Levy, Shie Mannor

Published: 20 Jul 2023, Last Modified: 29 Aug 2023EWRL16Readers: Everyone

Keywords: Convergence, Policy Gradient, Robsut Markov Decision Process

TL;DR: Faster global convergence rate for robust policy gradient for s-rectangular robust MDP

Abstract: Recently, global convergence has been achieved for non-robust MDPs with an iteration complexity of $O(\frac{1}{\epsilon})$ for finding an $\epsilon$-optimal policy, for which PL condition derived from performance difference lemma has played a key role. This work extends performance difference lemma to \texttt{s}-rectangular robust MDPs from which PL condition can be derived. We further, present a simplified proof for the policy gradient convergence for non-robust case, which together with robust performance difference lemma, can lead to global convergence of robust policy gradient.

1 Reply

Loading