How Width and Data Shape Generalization Scaling Laws in Quadratic Neural Networks

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scaling laws, feature learning, quadratic neural networks, high-dimensional asymptotics, phase diagram
TL;DR: We characterize scaling laws for feature-learning quadratic networks, showing how generalization depends jointly on data, width, regularization, and the spectral structure of the target.
Abstract: Understanding how performance scales jointly with model size and data is a central problem in modern machine learning. Existing theoretical works on scaling laws typically describe generalization as a function of data or compute, often in fixed-feature or infinite-width regimes and for online SGD. Here, we instead study how generalization scales with the number of trainable parameters and the number of samples in a feature-learning model. We analyze L2-regularized empirical risk minimization in a quadratic two-layer network in a finite-sample setting with structured data. This setting allows for an explicit asymptotic characterization of the generalization error as a function of the number of samples, model width, and regularization. Our results reveal a rich structure of scaling regimes as the number of parameters varies. In particular, in the feature-learning regime, the generalization error follows data-dependent power laws controlled by the spectral structure of the target. We further characterize the transitions between regimes, including the onset of interpolation, and their impact on generalization.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 56
Loading