Replacing Implicit Regression with Classification in Policy Gradient Reinforcement Learning

Josiah P. Hanna; Brahma S Pavse; Abhinav Narayan Harish

Replacing Implicit Regression with Classification in Policy Gradient Reinforcement Learning

Josiah P. Hanna, Brahma S Pavse, Abhinav Narayan Harish

Published: 04 Jun 2024, Last Modified: 19 Jul 2024Finding the Frame: RLC 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning; policy gradient learning; actor-critic algorithms

TL;DR: The policy gradient surrogate loss can be interpreted as a weighted regression problem; we show that reformulating as a weighted classification problem leads to improved policy gradient learning.

Abstract: Stochastic policy gradient methods are a fundamental class of reinforcement learning algorithms. When using these algorithms for continuous control, it is common to parameterize the policy using a Gaussian distribution. In this paper, we show that the policy gradient with Gaussian policies can be viewed as the gradient of a weighted least-squares objective function. That is, policy gradient algorithms are implicitly implementing a form of regression. Several recent works have shown that reformulating regression problems as classification problems can improve learning. Inspired by these works, we investigate whether replacing this implicit regression with classification can improve the data efficiency and stability of policy learning. We introduce a novel policy gradient surrogate objective for softmax policies over a discretized action space. This surrogate objective uses a form of cross-entropy loss to replace the implicit least-squares loss found in the surrogate loss for Gaussian policies. We extend prior theoretical analysis of this loss to our policy gradient surrogate objective and provide experiments showing that this novel loss improves the data efficiency of stochastic policy gradient learning.

Submission Number: 8

Loading