Beyond Target Networks: Improving Deep Q-learning with Functional Regularization

Alexandre Piché, Joseph Marino, Gian Maria Marconi, Christopher Joseph Pal, Mohammad Emtiyaz Khan

2021 (modified: 10 Oct 2021)CoRR 2021Readers: Everyone

Abstract: Much of the recent successes in Deep Reinforcement Learning have been based on minimizing the squared Bellman error. However, training is often unstable due to fast-changing target Q-values, and target networks are employed to regularize the Q-value estimation and stabilize training by using an additional set of lagging parameters. Despite their advantages, target networks are potentially an inflexible way to regularize Q-values which may ultimately slow down training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks, the regularization we propose here is explicit and enables us to use up-to-date parameters as well as control the regularization. This leads to a faster yet more stable training method. We analyze the convergence of our method theoretically and empirically validate our predictions on simple environments as well as on a suite of Atari environments. We demonstrate empirical improvements over target network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error

0 Replies