Iterative Deep Compression : Compressing Deep Networks for Classification and Semantic Segmentation

Sugandha Doda, Vitor Fortes Rey, Dr. Nadereh Hatami, Prof. Dr. Paul Lukowicz

Feb 15, 2018 (modified: Feb 15, 2018) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Machine learning and in particular deep learning approaches have outperformed many traditional techniques in accomplishing complex tasks such as image classfication, natural language processing or speech recognition. Most of the state-of-the art deep networks have complex architecture and use a vast number of parameters to reach this superior performance. Though these networks use a large number of learnable parameters, those parameters present significant redundancy. Therefore, it is possible to compress the network without much affecting its accuracy by eliminating those redundant and unimportant parameters. In this work, we propose a three stage compression pipeline, which consists of pruning, weight sharing and quantization to compress deep neural networks. Our novel pruning technique combines magnitude based ones with dense sparse dense ideas and iteratively finds for each layer its achievable sparsity instead of selecting a single threshold for the whole network. Unlike previous works, where compression is only applied on networks performing classification, we evaluate and perform compression on networks for classification as well as semantic segmentation, which is greatly useful for understanding scenes in autonomous driving. We tested our method on LeNet-5 and FCNs, performing classification and semantic segmentation, respectively. With LeNet-5 on MNIST, pruning reduces the number of parameters by 15.3 times and storage requirement from 1.7 MB to 0.006 MB with accuracy loss of 0.03%. With FCN8 on Cityscapes, we decrease the number of parameters by 8 times and reduce the storage requirement from 537.47 MB to 18.23 MB with class-wise intersection-over-union (IoU) loss of 4.93% on the validation data.