Auto Network Compression with Cross-Validation GradientDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Abstract: Network compression technology can compress large and complex networks into small networks, so that it can be deployed on devices with limited resources. Sparse regularization method, such as $\normlone$ or $L^{21}$ regularization, is the most popular method that can induce sparse model. However, it introduces new hyperparameters, which not only affects the degree of sparsity, but also involves whether the network can be effectively trained (gradient explosion or model non-convergence). How to select hyperparameters becomes an important and open problem for regularization-based network compression method. In this paper, we propose an auto network compression framework with cross-validation gradient which can automatically adjust the hyperparameters. Firstly, we design an unified framework which combines model parameter learning with hyperparametric learning. Secondly, in order to solve the problem of non-derivability of $\normlone$ norm, we introduce auxiliary variables to transform it into a solvable problem, and then obtain the derivative of model parameters with respect to hyperparameters. Finally, the derivative of the hyperparametric vector is solved by the chain rule. In solving the inverse problem of Heisen matrix, we compare three methods and only calculate the mixed partial derivatives. To a certain extent, this method realizes the automatic network compression. Classical network structures such as VGG, ResNet and DensNet are tested on CIFAR-10 and CIFAR-100 datasets to prove the effectiveness of our algorithm.
5 Replies