Can network pruning benefit deep learning under label noise?Download PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: network pruning, label noise, double descent, sparse loss landscape
Abstract: Network pruning is a widely-used technique to reduce the computational cost of over-parameterized neural networks. Conventional wisdom also regards pruning as a way to improve generalization: by zeroing out parameters, pruning reduces model capacity and prevents overfitting. However, this wisdom is facing challenges in a line of recent studies, which show that over-parameterization actually helps generalization. In this work, we demonstrate the existence of a novel double descent phenomenon in sparse regimes, namely, in the presence of label noise, medium sparsity induced by pruning hurts model performance, while high sparsity benefits. Through extensive experiments on noisy versions of MNIST, CIFAR-10 and CIFAR-100, We show that proper pruning could consistently promise non-trivial robustness against label noise, which provides a new lens for studying network pruning. Further, we reassess some common beliefs concerning the generalization of sparse networks, and hypothesize it is the distance from initialization that is key to robustness rather than sharpness/flatness. Experimental results correlate with this hypothesis. Together, our study provides valuable insight on whether, when and why network pruning benefits deep learning under label noise.
One-sentence Summary: We demonstrate the double descent phenomenon in sparse regimes, show the label-noise robustness of highly sparse networks, and provide hypothesis for the reasons behind it.
Supplementary Material: zip
6 Replies

Loading