Deterministic Policy Gradient: Convergence Analysis

Huaqing Xiong; Tengyu Xu; Lin Zhao; Yingbin Liang; Wei Zhang

Deterministic Policy Gradient: Convergence Analysis

Huaqing Xiong, Tengyu Xu, Lin Zhao, Yingbin Liang, Wei Zhang

Published: 20 May 2022, Last Modified: 05 May 2023UAI 2022 PosterReaders: Everyone

Keywords: deterministic policy gradient, finite time analysis

TL;DR: This paper studies the finite time performance guarantee of DPG algorithms

Abstract: The deterministic policy gradient (DPG) method proposed in Silver et al. (2014) has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an $\epsilon$-accurate stationary policy with a sample complexity of $\mathcal{O}(\epsilon^{-2})$. Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.

Supplementary Material: zip

5 Replies

Loading