Keywords: Stochastic Gradient Descent, Asymptotic Analysis, Discrete Time, Hessian
TL;DR: This paper challenges the assumption of uncorrelated noise in stochastic gradient descent with momentum, calculating the autocorrelation function of epoch-based noise and revealing a reduced weight variance in flat directions.
Abstract: Stochastic Gradient Descent (SGD) has become a cornerstone of neural network optimization due to its computational efficiency and generalization capabilities. However, the noise introduced by SGD is often assumed to be uncorrelated over time, despite the common practice of epoch-based training where data is sampled without replacement. In this work, we challenge this assumption and investigate the effects of epoch-based noise correlations on the stationary distribution of discrete-time SGD with momentum. Our main contributions are twofold: First, we calculate the exact autocorrelation of the noise during epoch-based training under the assumption that the noise is independent of small fluctuations in the weight vector, revealing that SGD noise is inherently anti-correlated over time. Second, we explore the influence of these anti-correlations on the variance of weight fluctuations. We find that for directions with curvature of the loss greater than a hyperparameter-dependent crossover value, the conventional results for uncorrelated noise are recovered. However, for relatively flat directions, the weight variance is significantly reduced, leading to a considerable decrease in loss fluctuations compared to the constant weight variance assumption. Furthermore, we demonstrate that training with these anti-correlations enhances test performance, suggesting that the inherent noise structure induced by epoch-based training plays a crucial role in finding flatter minima that generalize better.
Supplementary Material: zip
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10253
Loading