Class-Conditional Neuron Pre-Activation Divergence to rule out validation set in label noise early stopping
Keywords: Label noise, Early stopping, Neuron pre-activation, Memorization
TL;DR: We introduce CND, a validation-free early stopping method that detects when networks start memorizing noisy labels. It peaks at maximum generalization, outperforming prior approaches, especially on multi-class and low-noise datasets.
Abstract: Label noise poses a major challenge in supervised deep learning: models tend to memorize corrupted labels, leading to poor generalization. Early stopping can mitigate this but usually requires a clean validation set, reducing training data. We introduce Class-Conditional Neuron Pre-Activation Divergence (CND), a metric that measures the divergence between class-conditional and marginal distributions of neuron pre-activations. We show that pre-activations naturally form class-dependent modes, which collapse during memorization of noisy labels, making distributions less class-dependent. Leveraging this insight, we propose a validation-free early stopping criterion that relies only on training data. Specifically, we observe that the CNDs of the last layer peak at the point of maximum generalization, enabling training to be halted without a held-out validation set and thus preserving all available data for learning. Across benchmarks with symmetric and instance-dependent label noise, our method consistently outperforms other validation-free approaches—especially on datasets with many classes and at low noise levels. These results highlight the value of analyzing pre-activation distributions for understanding memorization and improving generalization.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 11527
Loading