Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to converge to the global optimal solution of s-rectangular robust MDPs with $\epsilon$-accuracy and are limited to deterministic setting with access to exact gradients and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting with access to only stochastic gradients and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.
Submission Number: 6977
Loading