Observation Noise and Initialization in Wide Neural Networks

Published: 19 Mar 2025, Last Modified: 25 Apr 2025AABI 2025 Workshop TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Tangent Kernel, Gaussian Processes, Wide Neural Networks, Observation Noise, Initialization
TL;DR: We formally show that weight-space regularization in wide neural networks is equivalent to adding aleatoric noise in an NTK-GP posterior mean, preserving linearization and enabling initialization with arbitrary prior means.
Abstract: Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific choice of prior mean and with zero observation noise. However, existing formulations of this result have two limitations: i) the resultant NTK-GP assumes no noise in the observed target variables, which can result in suboptimal predictions with noisy data; ii) it is unclear how to extend the equivalence to an arbitrary prior mean, a crucial aspect of formulating a well-specified model. To address the first limitation, we introduce a regularizer into the neural network's training objective, formally showing its correspondence to incorporating observation noise into the NTK-GP model. To address the second, we introduce a \textit{shifted network} that enables arbitrary prior mean functions. This approach allows us to perform gradient descent on a single neural network, without expensive ensembling or kernel matrix inversion. Our theoretical insights are validated empirically, with experiments exploring different values of observation noise and network architectures.
Submission Number: 28
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview