Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Published: 16 Jun 2024, Last Modified: 15 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Stochastic Gradient Descent, Asymptotic Analysis, Discrete Time, Hessian
TL;DR: This paper challenges the assumption of uncorrelated noise in stochastic gradient descent with momentum, calculating the autocorrelation function of epoch-based noise and revealing a reduced weight variance in flat directions.
Abstract: Stochastic gradient descent (SGD) is a fundamental optimization method in neural networks, yet the noise it introduces is often assumed to be uncorrelated over time. This paper challenges that assumption by examining epoch-based noise correlations in discrete-time SGD with momentum under a quadratic loss. Assuming that the noise is independent of small fluctuations in the weight vector, we calculate the exact autocorrelation of the noise and find that SGD noise is anti-correlated in time. We explore the impact of these anti-correlations on SGD dynamics, finding that for directions with curvature below a hyperparameter-dependent crossover value, the weight variance is significantly reduced. This reduction leads to decreased loss fluctuations, which we relate to SGD’s ability to find flat minima, thereby enhancing generalization performance.
Student Paper: Yes
Submission Number: 32
Loading