Wide Neural Network Training Dynamics for Reinforcement Learning

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: temporal difference learning, training dynamics, reinforcement learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: While deep reinforcement learning (RL) has demonstrated remarkable empirical success, understanding certain aspects of the training of deep RL agents remains elusive, leading to many RL algorithms requiring additional heuristic ingredients to be practically useful. In contrast to supervised learning, RL algorithms typically do not have access to ground-truth labels, leading to a more challenging training setup. In this work, we analyze the training dynamics of overparametrized, infinitely-wide value function networks, trained through temporal difference updates by extending previous results from neural tangent kernel approaches in supervised learning. We derive closed-form expressions for the training dynamics of common temporal difference policy evaluation methods as well as an analysis on the effects of uncertainty quantification of ensembling, a common heuristic measure of uncertainty in RL, in the infinite-width limit. We validate our analytically derived dynamic predictions on a toy environment, where we find good agreement with real neural networks. We also evaluate our methods on the classic control cart pole environment, and discover that the predictions and uncertainty quantification of our analytical solutions outperform those made by true ensembles trained via gradient descent.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 641
Loading