Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Etienne Boursier; Nicolas Flammarion

Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Etienne Boursier, Nicolas Flammarion

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Understanding generalization of overparametrized models remains a fundamental challenge in machine learning. The literature mostly studies generalization from an interpolation point of view, taking convergence towards a global minimum of the training loss for granted. This interpolation paradigm does not seem valid for complex tasks such as in-context learning or diffusion. It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks might converge towards simpler solutions rather than interpolating training data, which leads to a drastic improvement on the test loss. Our analysis relies on the so called early alignment phase, during which neurons align toward specific directions. This directional alignment leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.

Lay Summary: (1) Neural networks are known for their ability to fit training data exactly, yet still make accurate predictions on new data. Surprisingly, in many modern models, this perfect fitting doesn’t always happen. Our research explores this behavior in a simplified setting using small neural networks. (2) We show that when enough data is available, these networks often settle for simpler solutions that don’t fully match the training data but generalize better. (3) This “simplicity bias” helps explain why such models perform well in real-world tasks.

Link To Code: https://github.com/eboursier/simplicity_bias

Primary Area: Deep Learning->Theory

Keywords: Neural Networks, Simplicty Bias, Implicit Bias, One hidden ReLU Network, Early Alignment

Submission Number: 7116

Loading