Just How Flexible are Neural Networks in Practice?

Ravid Shwartz-Ziv; Micah Goldblum; Arpit Bansal; C. Bayan Bruss; Yann LeCun; Andrew Gordon Wilson

Just How Flexible are Neural Networks in Practice?

Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural networks, approximation theory, model complexity, generalization

TL;DR: The paper investigates the practical flexibility of neural networks, revealing that optimization methods, architecture, and data intricacies significantly impact their capacity to fit data.

Abstract: Although overparameterization theory suggests that neural networks can fit any dataset with up to as many samples as they have parameters, practical limitations often prevent them from reaching this capacity. In this study, we empirically investigate the practical flexibility of neural networks and uncover several surprising findings. Firstly, we observe that standard optimizers, such as stochastic gradient descent (SGD), often converge to solutions that fit significantly fewer samples than the model's parameter count, highlighting a gap between theoretical and practical capacity. Secondly, we find that convolutional neural networks (CNNs) are substantially more parameter-efficient than multi-layer perceptrons (MLPs) and Vision Transformers (ViTs), even when trained on randomly labeled data, emphasizing the role of architectural inductive biases. Thirdly, we demonstrate that the difference in a network's ability to fit correctly labeled data versus incorrectly labeled data is a strong predictor of generalization performance, offering a novel metric for predicting generalization. Lastly, we show that stochastic training methods like SGD enable networks to fit more data than full-batch gradient descent, suggesting that stochasticity enhances flexibility beyond regularization effects. These findings highlight the importance of understanding practical capacity limits and their implications for model generalization, providing new insights into neural network training and architectural design.

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12016

Loading