Generalization Aware Minimization

Akhilan Boopathy; Ila R Fiete

Generalization Aware Minimization

Akhilan Boopathy, Ila R Fiete

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: generalization, sharpness aware minimization, loss landscape, optimization

TL;DR: We design a generalized version of sharpness aware minimization that directly optimizes the expected test loss landscape, enhancing generalization.

Abstract: Sharpness-Aware Minimization (SAM) algorithms have effectively improved neural network generalization by steering model parameters away from sharp regions of the training loss landscape, which tend to generalize poorly. However, the underlying mechanisms of SAM are not fully understood, and recent studies question whether its bias toward flatter regions is why it improves generalization. In this work, we introduce Generalization-Aware Minimization (GAM), a generalized version of SAM that employs multiple perturbation steps instead of SAM's single-step perturbations. This allows GAM to directly guide model parameters toward areas of the landscape that generalize better. We show that the expected true (test) loss landscape is a rescaled version of the observed training loss landscape and demonstrate how GAM's multiple perturbative updates can be designed to optimize this expected true loss. Finally, we present a practical online algorithm that adapts GAM's perturbative steps during training to improve generalization, and we empirically validate its superior performance over SAM on benchmark datasets. We believe GAM sheds light on the generalization improvements of sharpness-based algorithms and can inspire the development of optimizers with even better generalization.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8047

Loading