Keywords: algorithmic stability, minimum-norm interpolant, deep neural networks
Abstract: Algorithmic stability is a classical framework for analyzing the generalization error of learning algorithms.
It predicts that an algorithm is likely to have a small generalization error if it is insensitive to small perturbations in the training set such as the removal or replacement of a training point.
While stability has been demonstrated for numerous well-known algorithms, this framework has had limited success in analyses of neural networks.
In this paper we study the algorithmic stability of deep ReLU neural networks that achieve zero training error using parameters with the smallest $L_2$ norm, also known as the minimum-norm interpolation, a phenomenon that can be observed in overparameterized models trained by gradient-based algorithms.
We find that such networks are stable when they contain a (possibly small) stable sub-network, followed by a layer with a low-rank weight matrix.
The low-rank assumption is inspired by recent empirical and theoretical results which demonstrate that training deep neural networks is biased towards low-rank weight matrices, for minimum-norm interpolation and weight-decay regularization.
Furthermore, we present a series of experiments supporting our finding that a trained deep neural network often consists of a stable sub-network and several final low-rank layers.
Is Neurips Submission: No
Submission Number: 37
Loading