Simplicity bias in $1$-hidden layer neural networks

Depen Morwani; Praneeth Netrapalli; jatin batra; Karthikeyan Shanmugam; Prateek Jain

Simplicity bias in $1$-hidden layer neural networks

Depen Morwani, Praneeth Netrapalli, jatin batra, Karthikeyan Shanmugam, Prateek Jain

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Simplicity Bias, Neural Network, Gradient Descent

Abstract: Recent works \citep{shah2020pitfalls,chen2021intriguing} have demonstrated that neural networks exhibit extreme \emph{simplicity bias} (SB). That is, they learn \emph{only the simplest} features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to lack of a general and rigorous definition of \emph{features}, these works showcase SB on \emph{semi-synthetic} datasets such as Color-MNIST, MNIST-CIFAR where defining features is relatively easier. In this work, we rigorously define as well as thoroughly establish SB for \emph{one hidden layer} neural networks. More concretely, (i) we define SB as the network essentially being a function of a low dimensional projection of the inputs (ii) theoretically, we show that when the data is linearly separable, the network primarily depends on only the linearly separable ($1$-dimensional) subspace even in the presence of an arbitrarily large number of other, more complex features which could have led to a significantly more robust classifier, (iii) empirically, we show that models trained on \emph{real} datasets such as Imagenette and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets, iv) finally, we present a natural ensemble approach that encourages diversity in models by training successive models on features not used by earlier models, and demonstrate that it yields models that are significantly more robust to Gaussian noise.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

TL;DR: Gradient Descent on 1-hidden-layer neural network learns a function of essentially a lower dimensional projection of the input.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/simplicity-bias-in-1-hidden-layer-neural/code)

21 Replies

Loading