Keywords: Model Compression, Pruning, DNNs, Training, Fine-tuning
Abstract: Pruning is one of the main compression methods for Deep Neural Networks (DNNs), where less relevant parameters are removed from a DNN model to reduce its memory footprint. To get better final accuracy, pruning is often performed iteratively with increasing amounts of parameters being removed in each step, and fine-tuning (i.e., additional training epochs) being applied to the remaining parameters. However, this process can be very time-consuming, since the finetuning process is applied after every pruning step and calculates gradients for the whole model. Motivated by these overheads, in this paper we propose ICE-Pick, a novel threshold-guided finetuning method which freezes less sensitive layers and leverages a custom pruning-aware learning rate scheduler. We evaluate ICE-Pick using ResNet-110, ResNet-152, and MobileNetV2 (all defined for CIFAR-10), and show that it can save up to 87.6% of the pruning time while maintaining accuracy.
Submission Number: 19
Loading