On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation

Valentin Thomas

On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation

Valentin Thomas

Published: 31 Oct 2022, Last Modified: 22 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: reinforcement learning, optimization, ranoverparamerizationdom matrix theory, random graphs, overparameterization

TL;DR: We study the role of overparameterization in Temporal Difference (TD) learning and how it affects optimization.

Abstract: Much of the recent successes of deep learning can be attributed to scaling up the size of the networks to the point where they often are vastly overparameterized. Thus, understanding the role of overparameterization is of increasing importance. While predictive theories have been developed for supervised learning, little is known about the Reinforcement Learning case. In this work, we take a theoretical approach and study the role of overparameterization for off-policy Temporal Difference (TD) learning in the linear setting. We leverage tools from Random Matrix Theory and random graph theory to obtain a characterization of the spectrum of the TD operator. We use this result to study the stability and optimization dynamics of TD learning as a function of the number of parameters.

Supplementary Material: pdf

14 Replies

Loading