Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning

Published: 19 Jun 2023, Last Modified: 09 Jul 2023Frontiers4LCDEveryoneRevisionsBibTeX
Keywords: Reinforcement learning; Nesterov momentum; Accelerated gradient; Global convergence
TL;DR: We present Accelerated Policy Gradient, which leverages Nesterov momentum and can provably achieve faster convergence rate than the standard policy gradient.
Abstract: Policy gradient methods have recently been shown to enjoy global convergence at a $\Theta(1/t)$ rate in the non-regularized tabular softmax setting. Accordingly, one important research question is whether this convergence rate can be further improved, with only first-order updates. In this paper, we answer the above question from the perspective of momentum by adapting the celebrated Nesterov's accelerated gradient (NAG) method to reinforcement learning (RL), termed *Accelerated Policy Gradient* (APG). To demonstrate the potential of APG in achieving faster global convergence, we start from the bandit setting and formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a $\tilde{O}(1/t^2)$ rate. To the best of our knowledge, this is the first characterization of the global convergence rate of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the initialization, APG could end up reaching a locally-concave regime, where APG could benefit significantly from the momentum, within finite iterations. By means of numerical validation, we confirm that APG exhibits $\tilde{O}(1/t^2)$ rate in the bandit setting and still preserves the $\tilde{O}(1/t^2)$ rate in various Markov decision process instances, showing that APG could significantly improve the convergence behavior over the standard policy gradient.
Submission Number: 114
Loading