Revisiting the noise Model of SGD

Published: 23 Oct 2023, Last Modified: 13 Nov 2023HeavyTails 2023EveryoneRevisionsBibTeX
Keywords: stochastic gradient descent (SGD), stochastic gradient noise (SGN), Levy noise
Abstract: The effectiveness of stochastic gradient descent (SGD) is significantly influenced by stochastic gradient noise (SGN). Following the central limit theorem, stochastic gradient noise (SGN) was initially described as Gaussian, but recently, Simsekli et al. demonstrated that SαS Lévy better characterizes the stochastic gradient noise. Here, we revisit the noise model of SGD and provide robust, comprehensive empirical evidence that SGN is heavy-tailed and is better represented by the SαS distribution. Furthermore, we argue that different deep neural network (DNN) parameters preserve distinct SGN properties throughout training. We develop a novel framework based on Lévy-driven stochastic differential equation (SDE), where one-dimensional Lévy processes describe each DNN parameter. This leads to a more accurate characterization of the dynamics of SGD around local minima.
Submission Number: 12