Sharp Minima Can Generalize: A Loss Landscape Perspective On Data

Raymond Fan, Bryce Sandlund, Lin Myat Ko

Published: 2025, Last Modified: 11 May 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization. Measuring minima volumes under varying amounts of training data reveals sharp minima which generalize well exist, but are unlikely to be found due to their small volumes. Increasing data changes the loss landscape, such that previously small generalizing minima become (relatively) large.
Loading