Towards Faster Global Convergence of Robust Policy Gradient Methods

Published: 19 Mar 2024, Last Modified: 19 Mar 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robust Reinforcement Learning, Markov Decision Processes, Robust Policy Gradient, Global Convergence for Robust MDPs
TL;DR: We establish a faster global convergence rates for policy gradient for robust MDPs, assuming the smoothness of the robust return, which is the case in many interesting cases.
Abstract: We establish the global convergence of the policy gradient method for robust Markov Decision Processes (MDPs) under the assumption that the robust return is smooth with respect to a policy. Despite restrictive, such smoothness assumption is satisfied in many interesting settings such as reward-robust MDPs. We also obtain iteration complexity comparable to non-robust MDPs that is significantly faster than existing rates for robust MDPs.
Submission Number: 8
Loading