Keywords: deep learning theory, neural networks, kernels
TL;DR: We propose an explanation for the gap between neural networks and their corresponding neural tangent kernels based on the ability of neural networks to find a sparse, localized signal in the presence of noise.
Abstract: Neural networks have been shown to outperform kernel methods in practice (including neural tangent kernels). Most theoretical explanations of this performance gap focus on learning a complex hypothesis class; in some cases, it is unclear whether this hypothesis class captures realistic data. In this work, we propose a related, but alternative, explanation for this performance gap in the image classification setting, based on finding a sparse signal in the presence of noise. Specifically, we prove that, for a simple data distribution with sparse signal amidst high-variance noise, a simple convolutional neural network trained using stochastic gradient descent learns to threshold out the noise and find the signal. On the other hand, the corresponding neural tangent kernel, with a fixed set of predetermined features, is unable to adapt to the signal in this manner. We supplement our theoretical results by demonstrating this phenomenon empirically: in CIFAR-10 and MNIST images with various backgrounds, as the background noise increases in intensity, a CNN's performance stays relatively robust, whereas its corresponding neural tangent kernel sees a notable drop in performance. We therefore propose the "local signal adaptivity" (LSA) phenomenon as one explanation for the superiority of neural networks over kernel methods.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.