Double Descent Revisited: When Noise Amplifies and Optimizers Decide

ICLR 2026 Conference Submission17541 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Overparametrization, Double Descent, Noise
Abstract: We examine how the double descent phenomenon emerges across different architectures, optimisers, learning rate schedulers and noise-robust losses. Previous studies have often attributed the interpolation peak to label noise. However, by systematically varying noise levels, optimizers, learning rate regimes and training losses, we demonstrate that, while noise can amplify the effect, it is unlikely to be the driving factor behind double descent. Instead, optimization dynamics, notably learning rate and optimizers, strongly influence whether a visible peak appears, often having a larger effect than adding label noise. Consistently, noise-robust losses partially mitigate double descent in settings where the amplification effect of noise is stronger, however their impact is negligible when this is not the case. Expanding on recent work, this study further confirms that noise primarily deteriorates the linear separability of different classes in feature space. Our results reconcile seemingly conflicting prior accounts and provide practical guidance: commonly used learning rate/scheduler combinations/losses can prevent double descent, even in noisy regimes. Furthermore, our study suggests that double descent might have a lesser impact in practice. Our code is available at https://anonymous.4open.science/r/DDxNoise.
Primary Area: learning theory
Submission Number: 17541
Loading