Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a minimalist model that successfully replicates the progressive sharpening and edge of stability phenomena, and empirically and theoretically analyze the effect of problem parameters in progressive sharpening
Abstract: When training deep neural networks with gradient descent, sharpness often increases---a phenomenon known as *progressive sharpening*---before saturating at the *edge of stability*. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study offers a deeper understanding of sharpness dynamics in neural network training, highlighting the interplay between depth, training data, and optimizers.
Lay Summary: When training neural networks, researchers have observed a consistent pattern: models become increasingly “sharp,” meaning more sensitive to small changes, before reaching a stable state. This process, called progressive sharpening, is common in practice but not yet well understood. In our study, we investigate this phenomenon using a highly simplified model: a neural network with just one neuron per layer. Remarkably, this minimal setup replicates the sharpening behavior seen in much larger networks, making it a valuable tool for theoretical analysis. We use this setup to study how sharpening is influenced by factors such as training data difficulty, network depth, and randomness in the training process. We also develop new mathematical tools that explain when and why sharpening occurs, and we show that these predictions remain valid in more realistic settings. By providing theoretical foundations for a widely observed yet puzzling phenomenon, our work helps deepen the understanding of neural network dynamics and can guide the design of more reliable learning algorithms.
Link To Code: https://github.com/Yoogeonhui/understand_progressive_sharpening/
Primary Area: Deep Learning->Theory
Keywords: progressive sharpening, sharpness
Submission Number: 15877
Loading