Understanding the role of depth in the neural tangent kernel for overparameterized neural networks.

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: neural tangent kernel, convergence, overparameterization
TL;DR: Insights into the behaviour of the NTK for deep overparameterized models.
Abstract: Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results describe how the normalized limiting kernel approaches the matrix of ones, yet the corresponding closed-form solution approaches a fixed limit on the sphere. We evaluate empirically the order of magnitude in network depth required to observe this convergent behaviour, and we describe the essential properties that enable the generalization of our results to other kernels.
Supplementary Material: pdf
Primary Area: learning theory
Submission Number: 23367
Loading