A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Stochastic Gradient Descent (SGD), Sharpness-Aware Minimization (SAM), Linear Stability Analysis, Two-Layer ReLU Networks, Stability and Convergence
Abstract: Understanding the dynamics of optimization algorithms in deep learning has become increasingly critical, especially as models grow in scale and complexity. Despite the empirical success of stochastic gradient descent (SGD) and its variants in finding solutions that generalize well, the precise mechanisms underlying this generalization remain poorly understood. A particularly intriguing aspect of this phenomenon is the bias of optimization algorithms towards certain types of minima—often flatter or simpler—especially in overparameterized regimes. While prior works have associated flatness of the loss landscape with better generalization, tools to mechanistically connect data, optimization algorithms, and the nature of the resulting minima are still limited. For instance, methods like Sharpness-Aware Minimization (SAM) have shown practical gains by explicitly promoting flatness, but lack a unified theoretical framework explaining their influence across different data structures and model architectures. In this work, we introduce a comprehensive linear stability analysis framework to dissect the behavior of optimization algorithms—SGD, random perturbations, and SAM—in neural networks, focusing particularly on two-layer ReLU models. Our approach is built upon a novel coherence measure that captures the interaction between data geometry and gradient similarity, providing new insights into why and how certain solutions are favored.
Supplementary Material: zip
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 24742
Loading