Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: online learning, large learning rate, batch size, implicit bias of SGD
TL;DR: High learning rate and small batch size are not beneficial in online learning. Further, there are strong similarities in the trajectories across learning rates and batch sizes.
Abstract: The success of SGD in deep learning has been ascribed by prior works to the *implicit bias* induced by high learning rate or small batch size ("SGD noise"). While prior works that focused on *offline learning* (i.e., multiple-epoch training), we study the impact of SGD noise on *online* (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that large learning rate and small batch size do *not* confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating larger or more cost-effective gradient steps. This suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless *gradient flow* algorithm. We study this hypothesis and provide supporting evidence in function space by conducting experiments that reduce SGD noise during training and by measuring the pointwise functional distance between models trained with varying SGD noise levels, but at equivalent loss values. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.
Supplementary Material: zip
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5885
Loading