Abstract: Deep neural networks have been shown to incorporate certain inductive biases and structural information about the input data, even at initialization when the network weights are randomly sampled from a prior distribution. We show that this phenomenon may be linked to a new type of neural network kernel, which we call the Neural Network Prior Kernel (NNPK). The NNPK value between two input examples is given by the expected inner product of their logits, where the expectation is calculated with respect to the prior weight distribution of the network. Although the NNPK is inherently infinite-dimensional, we study its properties empirically via a finite-sample approximation by representing the input examples as vectors of finitely many logits obtained from repeated random initializations of a network. Our analysis suggests that certain structures in the data that emerge from a trained model are already present at initialization. Our findings also indicate that the NNPK conveys information about the generalization performance of architectures. We then provide a theoretical result that connects the NNPK to the Neural Tangent Kernel (NTK) for infinitely-wide networks. We validate this connection empirically for a number of standard image classification models. Finally, we present an application of the NNPK for dataset distillation based on kernel ridge regression.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yingnian_Wu1
Submission Number: 1032
Loading