Keywords: Generalization, Trade-off, Ridge Regression, Near-interpolators
Abstract: We study linear regression when the input data population
covariance matrix has eigenvalues $\lambda_i \sim i^{-\alpha}$ where $\alpha > 1$.
Under a generic random matrix theory assumption, we prove
that any near-interpolator, i.e., ${\beta}$ whose training error is below the noise floor, must have its squared $\ell_2$-norm growing super-linearly with the number of samples $n$:
$\|{\beta}\|_{2}^{2} = \Omega(n^{\alpha})$. This implies that existing norm-based generalization bounds increase as the number of samples increases, matching the empirical observations from prior work.
On the other hand, such near-interpolators when properly tuned achieve good generalization, where the test errors approach arbitrarily close to the noise floor.
Our work demonstrates that existing norm-based generalization bounds are vacuous for explaining
the generalization capability of \emph{any} near-interpolators.
Moreover, we show that the trade-off between train and test accuracy is better when the norm growth exponential is smaller.
Submission Number: 90
Loading