Sharpness Can Be Manipulated and Misleading for Generalization

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: generalization, shaprness, hessian, activation sparsity, activation frequency, data augmentation
TL;DR: This paper provides several counterexamples for the correlation between flat minima and generalization, proving that the sharpness metric can be manipulated.
Abstract: Sharpness, commonly measured by the Hessian matrix, has long been hypothesized to correlate with generalization. However, this work presents several counterexamples where Hessian-based sharpness can be manipulated. We derive a formula for the Hessian trace, revealing its dependence on several key factors: the norm of network weights, activation frequency, and the entropy of the output distribution. By manipulating these factors, we construct scenarios where models reside in flat minima yet exhibit overfitting and poor test performance. Moreover, Gaussian noise injection reduces Hessian trace within the first epoch and can even yield arbitrarily flat minima without corresponding improvement in generalization. This suggests that sharpness may be correlated with generalization without being causally responsible for it.
Primary Area: optimization
Submission Number: 11811
Loading