Neural tangent kernel eigenvalues accurately predict generalizationDownload PDF

Published: 28 Jan 2022, Last Modified: 22 Oct 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: deep learning, generalization, neural tangent kernel, kernel regression, inductive bias
Abstract: Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel," one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no free lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network’s generalization for a given target function must worsen its generalization for orthogonal functions. We further demonstrate the utility of our theory by analytically predicting two surprising phenomena --- worse-than-chance generalization on hard-to-learn functions and nonmonotonic error curves in the small data regime --- which we subsequently observe in experiments. Though our theory is derived for infinite-width architectures, we find it agrees with networks as narrow as width 20, suggesting it is predictive of generalization in practical neural networks.
One-sentence Summary: We derive a predictive theory of generalization for wide neural networks and empirically confirm its accuracy.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2110.03922/code)
16 Replies

Loading