Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization

Alexandre Piché; Joseph Marino; Gian Maria Marconi; Valentin Thomas; Christopher Pal; Mohammad Emtiyaz Khan

Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization

Alexandre Piché, Joseph Marino, Gian Maria Marconi, Valentin Thomas, Christopher Pal, Mohammad Emtiyaz Khan

12 Oct 2021 (modified: 04 May 2025)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: Q learning, regularization, deep Q learning

TL;DR: We propose a functionally regularize alternative to the squared Bellman error.

Abstract: A majority of recent successes in deep Reinforcement Learning are based on minimization of square Bellman error. The training is often unstable due to a fast-changing target $Q$-values, and target networks are employed to stabilize by using an additional set of lagging parameters. Despite their advantages, target networks could inhibit the propagation of newly-encountered rewards which may ultimately slow down the training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks', the regularization here is explicit which not only enables us to use up-to-date parameters but also control the regularization. This leads to a fast yet stable training method. Across a range of Atari environments, we demonstrate empirical improvements over target-network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/beyond-target-networks-improving-deep-q/code)

0 Replies

Loading