Learnability in the Context of Neural Tangent Kernels

Published: 10 Oct 2024, Last Modified: 09 Nov 2024SciForDL PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Can the Neural Tangent Kernel tell us what samples it finds easy and hard to learn? We derive bounds on the sample error via the NTK, and show interesting results on what a CNN finds and doesn't find hard to learn.
Abstract: Understanding the prioritization of certain samples over others during neural network training is a fundamental challenge in deep learning. This prioritization is intrinsically linked to the network's inductive bias—the inherent assumptions that enable generalization from training data to unseen data. In this study, we investigate the role of the diagonal elements of the Neural Tangent Kernel (NTK), \( k(x, x) \), in determining sample learnability. Through theoretical analysis, we demonstrate that higher values of \( k(x, x) \) correlate with faster convergence rates of individual sample errors during training, indicating that such samples are learned more rapidly and accurately. Conversely, lower \( k(x, x) \) values are associated with slower learning dynamics, classifying these samples as harder to learn. Empirical evaluations conducted on standard datasets, including MNIST and CIFAR-10, using convolutional neural networks (CNNs), validate our theoretical predictions. We observe that samples with higher \( k(x, x) \) values consistently achieve higher accuracy in fewer training epochs compared to those with lower values. Visual inspections further reveal that high-\( k(x, x) \) samples are typically clear and prototypical, whereas low-\( k(x, x) \) samples often exhibit noise or atypical characteristics.
Style Files: I have used the style files.
Submission Number: 58
Loading