Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: deep learning theory; large learning rate; oscillation of stochastic gradient descent;
Abstract: In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) with \emph{large learning rates}. Under such a training regime, our finding is that, the \emph{oscillation} of the NN weights caused by SGD with large learning rates turns out to be beneficial to the generalization of the NN, potentially improving over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon ``\emph{benign oscillation}".
Submission Number: 28