Stochastic Neural Tangent Kernel: Revisiting the NTK For SGD

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Tangent Kernel, Stochastic Gradient Descent, Infinite-Width Neural Networks, Stochastic Differential Equations, Minibatch Noise, Generalization, Flat Minima, Function-Space Dynamics, Kernel Regression
Abstract: Stochastic Gradient Descent (SGD) is a foundational algorithm for training neural networks, valued for its efficiency and generalization enabled by intrinsic stochasticity. Existing theoretical frameworks like the Neural Tangent Kernel (NTK) describe training dynamics in the infinite-width limit but omit the stochastic effects of minibatch sampling. This paper introduces the Stochastic Neural Tangent Kernel (SNTK), an extension of the NTK that incorporates adaptive, residual-weighted noise from SGD's minibatch-induced randomness. By modeling SGD as a continuous-time stochastic differential equation projected into function space, we rigorously characterize the noise structure and derive a kernel capturing the evolving stochastic dynamics. Our formulation unifies deterministic kernel regression and stochastic optimization, demonstrating how minibatch noise directs training toward flatter minima and greater robustness. Empirical evaluations on the two moons dataset show the SNTK’s improved ability to represent SGD’s functional behavior compared to classical NTK methods. This work addresses a significant gap in understanding SGD’s function-space dynamics and provides a tool for further studies on optimization, generalization, and the interaction between noise and representation learning in wide neural networks.
Submission Number: 145
Loading