Replacing Implicit Regression with Classification in Policy Gradient Reinforcement Learning

Published: 04 Jun 2024, Last Modified: 19 Jul 2024Finding the Frame: RLC 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning; policy gradient learning; actor-critic algorithms
TL;DR: The policy gradient surrogate loss can be interpreted as a weighted regression problem; we show that reformulating as a weighted classification problem leads to improved policy gradient learning.
Abstract: Stochastic policy gradient methods are a fundamental class of reinforcement learning algorithms. When using these algorithms for continuous control, it is common to parameterize the policy using a Gaussian distribution. In this paper, we show that the policy gradient with Gaussian policies can be viewed as the gradient of a weighted least-squares objective function. That is, policy gradient algorithms are implicitly implementing a form of regression. Several recent works have shown that reformulating regression problems as classification problems can improve learning. Inspired by these works, we investigate whether replacing this implicit regression with classification can improve the data efficiency and stability of policy learning. We introduce a novel policy gradient surrogate objective for softmax policies over a discretized action space. This surrogate objective uses a form of cross-entropy loss to replace the implicit least-squares loss found in the surrogate loss for Gaussian policies. We extend prior theoretical analysis of this loss to our policy gradient surrogate objective and provide experiments showing that this novel loss improves the data efficiency of stochastic policy gradient learning.
Submission Number: 8
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview