Observation Noise and Initialization in Wide Neural Networks

Sergio Calvo Ordoñez; Jonathan Plenk; Richard Bergna; Alvaro Cartea; José Miguel Hernández-Lobato; Konstantina Palla; Kamil Ciosek

Observation Noise and Initialization in Wide Neural Networks

Sergio Calvo Ordoñez, Jonathan Plenk, Richard Bergna, Alvaro Cartea, José Miguel Hernández-Lobato, Konstantina Palla, Kamil Ciosek

Published: 19 Mar 2025, Last Modified: 25 Apr 2025AABI 2025 Workshop TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural Tangent Kernel, Gaussian Processes, Wide Neural Networks, Observation Noise, Initialization

TL;DR: We formally show that weight-space regularization in wide neural networks is equivalent to adding aleatoric noise in an NTK-GP posterior mean, preserving linearization and enabling initialization with arbitrary prior means.

Abstract: Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific choice of prior mean and with zero observation noise. However, existing formulations of this result have two limitations: i) the resultant NTK-GP assumes no noise in the observed target variables, which can result in suboptimal predictions with noisy data; ii) it is unclear how to extend the equivalence to an arbitrary prior mean, a crucial aspect of formulating a well-specified model. To address the first limitation, we introduce a regularizer into the neural network's training objective, formally showing its correspondence to incorporating observation noise into the NTK-GP model. To address the second, we introduce a \textit{shifted network} that enables arbitrary prior mean functions. This approach allows us to perform gradient descent on a single neural network, without expensive ensembling or kernel matrix inversion. Our theoretical insights are validated empirically, with experiments exploring different values of observation noise and network architectures.

Submission Number: 28

Loading