Rethinking Deep Policy Gradients via State-Wise Policy ImprovementDownload PDF

Published: 09 Dec 2020, Last Modified: 05 May 2023ICBINB 2020 PosterReaders: Everyone
Keywords: reinforcement learning, policy gradient, ppo
TL;DR: Given the mismatch between theory and empirical behavior of deep policy gradient methods, this paper proposes a theoretical framework to reinterpret the deep policy gradient.
Abstract: Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments. However, recent studies show that the key components of the deep policy gradient methods, such as gradient estimation, value prediction, and optimization landscapes, fail to reflect the conceptual framework. This paper aims to investigate the mechanism behind the deep policy gradient methods through the lens of state-wise policy improvement. Based on the fundamental properties of policy improvement, we propose an alternative theoretical framework to reinterpret the deep policy gradient update as training a binary classifier, with labels provided by the advantage function. This framework obviates the statistical difficulties in the gradient estimates and predicted values of the deep policy gradient update. Experimental results are included to corroborate the proposed framework.
1 Reply

Loading