On the Role of Label Noise in the Feature Learning Process

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We theoretically characterize the role of label noise in training neural networks from a feature learning perspective, identifying two-stages in the training dynamics.
Abstract: Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each sample comprises a label-dependent signal and label-independent noise, and rigorously analyze the training dynamics of a two-layer convolutional neural network under this data setup, along with the presence of label noise. Our analysis identifies two key stages. In Stage I, the model perfectly fits all the clean samples (i.e., samples without label noise) while ignoring the noisy ones (i.e., samples with noisy labels). During this stage, the model learns the signal from the clean samples, which generalizes well on unseen data. In Stage II, as the training loss converges, the gradient in the direction of noise surpasses that of the signal, leading to overfitting on noisy samples. Eventually, the model memorizes the noise present in the noisy samples and degrades its generalization ability. Furthermore, our analysis provides a theoretical basis for two widely used techniques for tackling label noise: early stopping and sample selection. Experiments on both synthetic and real-world setups validate our theory.
Lay Summary: Training deep learning models with incorrect (noisy) labels is a major challenge, often leading to poor performance. In this work, we take a closer look at how such models learn when some of the training labels are wrong. We study a simple yet representative neural network and show that the learning process happens in two distinct stages. In the first stage, the model learns useful patterns from correctly labeled data while largely ignoring the incorrect ones—this leads to good performance on new, unseen data. But if training continues too long, the model enters a second stage where it starts to learn from the wrong labels. This causes it to "memorize the noise," ultimately reducing its ability to generalize. Our analysis also sheds light on why two popular strategies—early stopping (ending training before the second stage) and sample selection (focusing on more reliable data)—are effective ways to handle label noise. We support our findings with both simulated and real-world experiments.
Link To Code: https://github.com/zzp1012/label-noise-theory
Primary Area: Theory->Deep Learning
Keywords: Label noise, Feature Learning, Training Dynamics
Submission Number: 4799
Loading