Keywords: Deep Neural Networks, Nature of Generalization, Pointwise Riemannian Dimension, Feature Learning, Finite-Scale Geometry, Avoid NTK and Exponential Norm Barriers
TL;DR: Complete generalization theory for fully connected deep nets: bounds depend on the effective rank of learned features at trained model, and are empirically orders of magnitude tighter.
Abstract: We address the long-standing question of why deep neural networks generalize by establishing a complete pointwise generalization theory for fully connected networks. For each trained model, we equip the hypothesis with a pointwise Riemannian dimension through the effective ranks of the {\it learned} feature matrices across layers, and derive hypothesis- and data-dependent generalization bounds. These spectrum-aware bounds break long-standing barriers and are orders of magnitude tighter in theory and experiment, rigorously surpassing bounds based on model size, products of norms, and infinite-width linearizations. Analytically, we identify structural properties and mathematical principles that explain the tractability of deep nets. Empirically, the pointwise Riemannian dimension exhibits substantial dimensionality reduction, decreases with increased over-parameterization, and captures feature learning and the implicit bias of optimizers across standard datasets and modern architectures. Taken together, these results show that deep networks are mathematically tractable in the practical regime and that their generalization is sharply explained by pointwise, spectrum-aware complexity.
Primary Area: learning theory
Submission Number: 17867
Loading