Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks
TL;DR: We characterize infinit-width NNGP- and NTK kernels through their RKHS, covering common activation functions.
Abstract: In recent years, the neural tangent kernel (NTK) and neural network Gaussian process kernel (NNGP) have given theoreticians tractable limiting cases of fully connected neural networks. However, the property of these kernels are poorly understood for activation functions other than powers of the ReLU.
Our main contribution is a characterization of the RKHS of these kernels for activation functions whose only non-smoothness is at zero.
This extends existing theory to numerous commonly used activation functions such as SELU, ELU, or LeakyReLU.
Additionally, we analyze a broad set of special cases such as missing biases, two-layer networks, or polynomial activations.
Our results show that a broad class of not infinitely smooth activations generate equivalent RKHSs at different network depths, depending only on the degree of the non-smoothness up to equivalence. On the other hand, the RKHS generated by polynomial activations depends on the networks depth.
Finally, we derive results for the smoothness of NNGP sample paths, characterizing the smoothness of infinitely wide neural networks at initialization.
Submission Number: 1478
Loading