Implicitly regularized interaction between SGD and the loss landscape geometryDownload PDF

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone
Keywords: SGD, learning rate, batch size, optimization, generalization, implicit bias, implicit regularization, sharpness, scaling rule
TL;DR: We find that SGD induces an implicit regularization on the interaction between the gradient distribution and the loss landscape geometry, and we propose a more accurate scaling rule between batch size and learning rate.
Abstract: We study unstable dynamics of stochastic gradient descent (SGD) and its impact on generalization in neural networks. We find that SGD induces an implicit regularization on the interaction between the gradient distribution and the loss landscape geometry. Moreover, based on the analysis of a concentration measure of the batch gradient, we propose a more accurate scaling rule, Linear and Saturation Scaling Rule (LSSR), between batch size and learning rate.
Supplementary Material: zip
14 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview