Keywords: Neural network initialization, AI interpretability, Activation functions
TL;DR: Certain neural networks at initialization behave like random functions, and we propose these as interesting objects for interpretability and the computational no-coincidence conjecture.
Abstract: We establish that randomly initialized neural networks have nearly independent outputs when they have large width, common hyperparameters, and a nonlinear activation function that has zero mean under the gaussian measure $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. This includes tanh networks, complementing prior research on their bias towards complex functions.
For our hyperparameters, the zero-mean condition is strict and also proves that ReLU and GeLU networks have well correlated outputs.
Because of the nearly independent outputs of tanh and related neural networks, we propose them as a promising construction for the computational no-coincidence conjecture which aims to measure the limits of AI interpretability.
Serve As Reviewer: ~John_Dunbar1
Confirmation: I confirm that I and my co-authors have read the policies are releasing our work under a CC-BY 4.0 license.
Submission Number: 26
Loading