Robust Losses for Learning Value Functions

Andrew Patterson; Victor Liao; Martha White

Robust Losses for Learning Value Functions

Andrew Patterson, Victor Liao, Martha White

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Reinforcement Learning

Abstract: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. Typical strategies to control these high-magnitude updates in RL involve clipping gradients, clipping rewards, rescaling rewards, and clipping errors. Clipping errors is related to using robust losses, like the Huber loss, but as yet no work explicitly formalizes and derives value learning algorithms with robust losses. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem, and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We show that the resulting solutions have significantly lower error for certain problems and are otherwise comparable, in terms of both absolute and squared value error. We show that the resulting gradient-based algorithms are more robust, for both prediction and control, with less stepsize sensitivity.

One-sentence Summary: Develops two novel robust loss functions for reinforcement learning, the mean absolute Bellman error and the mean Huber Bellman error, and empirically investigates solutions to these losses as well as algorithms for both prediction and control.

17 Replies

Loading