Keywords: generalization, empirical phenomena, overparameterization
Abstract: Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss. Here, we initiate a much richer study of classifiers by considering the entire joint distribution of their inputs and outputs. We present both new empirical behaviors of standard classifiers, as well as quantitative conjectures which capture these behaviors. Informally, our conjecture states: the output distribution of an interpolating classifier matches the distribution of true labels, when conditioned on certain subgroups of the input space. For example, if we mislabel 30% of dogs as cats in the train set of CIFAR-10, then a ResNet trained to interpolation will in fact mislabel roughly 30% of dogs as cats on the *test set* as well, while leaving other classes unaffected. This conjecture has implications for the theory of overparameterization, scaling limits, implicit bias, and statistical consistency. Further, it can be seen as a new kind of generalization, which goes beyond measuring single-dimensional quantities to measuring entire distributions.
One-sentence Summary: We study generalization beyond single-dimensional metrics, with new empirical behaviors and formal conjectures.