Generalization Aware Minimization

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: generalization, sharpness aware minimization, loss landscape, optimization
TL;DR: We design a generalized version of sharpness aware minimization that directly optimizes the expected test loss landscape, enhancing generalization.
Abstract: Sharpness-Aware Minimization (SAM) optimizers have improved neural network generalization relative to stochastic gradient descent (SGD). The goal of SAM is to steer model parameters away from sharp regions of the training loss landscape, which are believed to generalize poorly. However, the underlying mechanisms of SAM including whether its bias toward flatter regions is why it improves generalization are not fully understood. In this work, we introduce Generalization-Aware Minimization (GAM), derived by directly applying the goal of guiding model parameters toward regions of the landscape that generalize better. We do so by showing mathematically through a Bayesian derivation that the landscape of expected true (test) loss is a rescaled version of the observed training loss landscape, and that a sequence of perturbative updates in place of SAM's single perturbative update can optimize the expected test loss. We present a practical online algorithm to implement GAM's perturbative steps during training. Finally, we empirically demonstrate that GAM has superior performance over SAM, improving generalization performance on a range of benchmarks. We believe that GAM provides valuable insights into how sharpness-based algorithms improve generalization, is a superior optimizer for generalization, and may inspire the development of still-better optimizers.
Primary Area: optimization
Submission Number: 13499
Loading