Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Hesham Mostafa; Xin Wang

Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Hesham Mostafa, Xin Wang

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Modern deep neural networks are highly overparameterized, and often of huge sizes. A number of post-training model compression techniques, such as distillation, pruning and quantization, can reduce the size of network parameters by a substantial fraction with little loss in performance. However, training a small network of the post-compression size de novo typically fails to reach the same level of accuracy achieved by compression of a large network, leading to a widely-held belief that gross overparameterization is essential to effective learning. In this work, we argue that this is not necessarily true. We describe a dynamic sparse reparameterization technique that closed the performance gap between a model compressed through iterative pruning and a model of the post-compression size trained de novo. We applied our method to training deep residual networks and showed that it outperformed existing reparameterization techniques, yielding the best accuracy for a given parameter budget for training. Compared to existing dynamic reparameterization methods that reallocate non-zero parameters during training, our approach achieved better performance at lower computational cost. Our method is not only of practical value for training under stringent memory constraints, but also potentially informative to theoretical understanding of generalization properties of overparameterized deep neural networks.

Keywords: sparse, reparameterization, overparameterization, convolutional neural network, training, compression, pruning

TL;DR: We describe a dynamic sparse reparameterization technique that allow training of a small sparse network to generalize on par with, or better than, a full-sized dense model compressed to the same size.

31 Replies

Loading