A Large Deviation Theory Analysis on the Implicit Bias of SGD

Andres R Masegosa; Luis A. Ortega

A Large Deviation Theory Analysis on the Implicit Bias of SGD

Andres R Masegosa, Luis A. Ortega

27 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: implicit bias, implicit regularization, optimization, stochastic gradient descent, large deviation theory

Abstract: Stochastic Gradient Descent (SGD) plays a key role in training deep learning models, yet its ability to implicitly regularize and enhance generalization remains an open theoretical question. We apply Large Deviation Theory (LDT) to analyze why SGD selects models with strong generalization properties. We show that the generalization error jointly depends on the level of concentration of its empirical loss around its expected value and the \textit{abnormality} of the random deviations stemming from the stochastic nature of the training data observation process. Our analysis reveals that SGD gradients are inherently biased toward models exhibiting more concentrated losses and less abnormal and smaller random deviations. These theoretical insights are empirically validated using deep convolutional neural networks, confirming that mini-batch training acts as a natural regularizer by preventing convergence to models with high generalization errors.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11297

Loading