Keywords: concentration inequalities, isoperimetry, robustness, stability, classification problems, generalization, overparameterization
TL;DR: We show that interpolating classifiers can only be stable, and thus generalize well, if they are sufficiently overparameterized.
Abstract: The relationship between overparameterization, stability, and generalization
remains incompletely understood in the setting of discontinuous classifiers. We
address this gap by establishing a generalization bound for finite function classes that improves
inversely with _class stability_, defined as the expected distance
to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion
of robustness, we derive as a corollary a _law of robustness_ for
classification that extends the results of Bubeck and Selke beyond
smoothness assumptions to discontinuous functions. In particular, any
interpolating model with $p \approx n$ parameters on $n$ data points must be
_unstable_, implying that substantial overparameterization is necessary
to achieve high stability. We obtain analogous results for (parameterized) infinite function classes by analyzing a stronger robustness measure derived from the margin in the co-domain, which we refer to as the _normalized co-stability_. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.
Primary Area: learning theory
Submission Number: 24557
Loading