Generalization Error Bound via Embedding Dimension and Network Lipschitz Constant

17 Sept 2025 (modified: 17 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generalization Error Bound, Intrinsic Dimension, Wasserstein Distance, Lipschitz continuity
Abstract: Modern deep networks generalize well even in heavily over-parameterized regimes, where traditional parameter-based bounds become vacuous. We propose a representation-centric view of generalization, showing that the generalization error is controlled jointly by: (i) the intrinsic dimension of learned embeddings, which reflects how much the data distribution is compressed and determines how quickly the empirical distribution of embeddings converges to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, quantified by Lipschitz constants. Together these factors yield a new generalization error bound that explicitly links embedding dimension with network architecture. At the final embedding layer, architectural sensitivity vanishes, and the bound is driven more strongly by embedding dimension, explaining why final-layer dimensionality is often a strong empirical predictor of generalization. Experiments across datasets, architectures and controlled interventions validate the theoretical predictions and demonstrate the practical value of embedding-based diagnostics. Overall, this work shifts the focus of generalization analysis from parameter to representation geometry, offering both theoretical insight and actionable tools for deep learning practice.
Primary Area: learning theory
Submission Number: 8893
Loading