A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

Sergio Calvo Ordoñez; Jonathan Plenk; Richard Bergna; Alvaro Cartea; José Miguel Hernández-Lobato; Konstantina Palla; Kamil Ciosek

A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

Sergio Calvo Ordoñez, Jonathan Plenk, Richard Bergna, Alvaro Cartea, José Miguel Hernández-Lobato, Konstantina Palla, Kamil Ciosek

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We formally propose a framework for wide neural networks that shows equivalence to adding aleatoric noise in an NTK-GP posterior mean, preserving linearization and enabling initialization with arbitrary prior means.

Abstract: Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) observation noise, since the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalence in applied Gaussian process modeling.

Code Dataset Promise: No

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 1683

Loading