On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

Published: 09 Jun 2025, Last Modified: 09 Jun 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Label Noise SGD, Two-layer Linear Network, Learning Dynamics
Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we delve into the underlying mechanisms behind stochastic gradient descent (SGD) with label noise. Focusing on a two-layer over-parameterized linear network, we analyze the learning dynamics of label noise SGD, unveiling a two-phase learning behavior. In *Phase I*, the magnitudes of model weights progressively diminish, and the model escapes the lazy regime; enters the rich regime. In *Phase II*, the alignment between model weights and the ground-truth interpolator increases, and the model eventually converges. Our analysis highlights the critical role of label noise in driving the transition from the lazy to the rich regime and minimally explains its empirical success. Extensive experiments, conducted under both synthetic and real-world setups, strongly support our theory.
Student Paper: Yes
Submission Number: 55
Loading