The Curse of Depth in Kernel RegimeDownload PDF

Published: 18 Oct 2021, Last Modified: 05 May 2023ICBINB@NeurIPS2021 SpotlightReaders: Everyone
Keywords: Neural Tangent Kernel, Initialization, Deep Neural Networks, Neural Tangent Kernel, Initialization, Deep Neural Networks.
TL;DR: Exact convergence rates (w.r.t depth) for the NTK regime
Abstract: Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.
Category: Negative result: I would like to share my insights and negative results on this topic with the community, Criticism of default practices: I would like to question some well-spread practice in the community, Other (please specify)
Category Description: We show exactly how fast the NTK regime fails.
1 Reply

Loading