Keywords: Sharpness-Aware Minimization, Implicit Bias, Training Dynamics
TL;DR: We study the implicit bias of SAM during the late phase of training, revealing that SAM efficiently selects flatter minima over SGD even when applied in the last few epochs.
Abstract: Sharpness-Aware Minimization (SAM) has substantially improved the generalization of neural networks under various settings.
Despite the success, its effectiveness remains poorly understood.
In this work, we discover an intriguing phenomenon in the training dynamics of SAM, shedding lights on understanding its implicit bias towards flatter minima over Stochastic Gradient Descent (SGD).
Specifically, we find that *SAM efficiently selects flatter minima late in training*.
Remarkably, even a few epochs of SAM applied at the end of training yield nearly the same generalization and solution sharpness as full SAM training.
Subsequently, we delve deeper into the underlying mechanism behind this phenomenon.
Theoretically, we identify two phases in the learning dynamics after applying SAM late in training: i) SAM first escapes the minimum found by SGD exponentially fast; and ii) then rapidly converges to a flatter minimum within the same valley.
Furthermore, we empirically investigate the role of SAM during the early training phase.
We conjecture that the optimization method chosen in the late phase is more crucial in shaping the final solution's properties.
Based on this viewpoint, we extend our findings from SAM to Adversarial Training.
We provide source code in supplementary materials and will release checkpoints in future.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1483
Loading