The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich RegimesDownload PDF

Published: 01 Feb 2023, 19:23, Last Modified: 01 Feb 2023, 19:23ICLR 2023 posterReaders: Everyone
Keywords: Feature Learning, Neural Tangent Kernel, Scaling Laws, Deep Ensembles
TL;DR: Empirical study of neural networks in the overparameterized regime shows how finite-width effects are brought on by initialization variance as sample size grows.
Abstract: For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature learning regime. However, at a critical sample size $P^*$, the finite-width network generalization begins to worsen compared to the infinite width performance. In this work, we empirically study the transition from the infinite width behavior to this variance-limited regime as a function of sample size $P$ and network width $N$. We find that finite size effects can become relevant for very small dataset sizes going as $P^* \sim \sqrt{N}$ for polynomial regression with ReLU networks. We discuss the source of this finite size behavior based on the variance of the NN's final neural tangent kernel (NTK). We then show how this transition can be pushed to larger $P$ by enhancing feature learning or by ensemble averaging the network. We find that the learning curve for regression with the final NTK is an accurate approximation of the NN learning curve. Using this, we provide a toy model which also exhibits $P^* \sim \sqrt{N}$ scaling and has $P$-dependent benefits from feature learning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
14 Replies

Loading