Feature learning is decoupled from generalization in high capacity neural networks

Published: 09 Jun 2025, Last Modified: 07 Jul 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: feature learning, generalization, rich/lazy regime, neural tangent kernel, conjugate kernel
TL;DR: We examine existing theories of feature learning and demonstrate that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves.
Abstract: Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization
Student Paper: No
Submission Number: 31
Loading