MLPrune: Multi-Layer Pruning for Automated Neural Network Compression

Wenyuan Zeng, Raquel Urtasun

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Model compression can significantly reduce the computation and memory footprint of large neural networks. To achieve a good trade-off between model size and accuracy, popular compression techniques usually rely on hand-crafted heuristics and require manually setting the compression ratio of each layer. This process is typically costly and suboptimal. In this paper, we propose a Multi-Layer Pruning method (MLPrune), which is theoretically sound, and can automatically decide appropriate compression ratios for all layers. Towards this goal, we use an efficient approximation of the Hessian as our pruning criterion, based on a Kronecker-factored Approximate Curvature method. We demonstrate the effectiveness of our method on several datasets and architectures, outperforming previous state-of-the-art by a large margin. Our experiments show that we can compress AlexNet and VGG16 by 25x without loss in accuracy on ImageNet. Furthermore, our method has much fewer hyper-parameters and requires no expert knowledge.
  • Keywords: Automated Model Compression, Neural Network Pruning
  • TL;DR: MLPrune: an automated pruning method that doesn't require any tuning for per-layer compression ratio, achieves state-of-the-art pruning results on AlexNet and VGG16.
0 Replies