Abstract: Deep neural networks (DNNs) have achieved great success across numerous application domains. However, DNNs generally demand considerable memory and computational resources, hindering their deployment on front-end systems with constrained storage and computing power. In this paper, we propose a multiscale compression approach for slimming and accelerating DNNs, which involves loss-aware adaptive saliency selection of filter pruning criteria and hints-aware strategy based on knowledge distillation for fine-tuning pruned models. At each pruning step, we adaptively pick a saliency-based filter-pruning criterion from a predefined set and select which layer to prune based on the model’s loss evaluated on a small training subset. Therefore, iterative layer-wise filter pruning adaptively and dynamically removes redundant filters to compress over-parameterized neural networks. It is noted that teacher and student networks have similar feature representations, we leverage knowledge-distillation fine-tuning to transfer dark knowledge and hints from the teacher, thereby enhancing the student model’s performance. Our approach alternates between pruning and fine-tuning stages iteratively, achieving an effective trade-off between model compression and accuracy enhancement. The extensive experimental results show that our proposed approach outperforms existing model compression methods on a variety of deep neural network architectures in terms of the generalization ability and the superiority.
External IDs:dblp:conf/icic/LinLP25
Loading