Bridging the Gap Between Target Networks and Functional Regularization

Alexandre Piché; Valentin Thomas; Joseph Marino; Rafael Pardinas; Gian Maria Marconi; Christopher Pal; Mohammad Emtiyaz Khan

Bridging the Gap Between Target Networks and Functional Regularization

Alexandre Piché, Valentin Thomas, Joseph Marino, Rafael Pardinas, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

Published: 06 Sept 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conducted an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/AlexPiche/fr-tmlr

Supplementary Material: zip

Assigned Action Editor: ~Amir-massoud_Farahmand1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1215

Loading