All-in-One Hardware-Oriented Model Compression for Efficient Multi-Hardware Deployment

Haoxuan Wang, Pengyang Ling, Xin Fan, Tao Tu, Jinjin Zheng, Huaian Chen, Yi Jin, Enhong Chen

Published: 2024, Last Modified: 23 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Structured pruning is an efficient compression technique that significantly reduces the inference latency and energy consumption of convolutional neural networks (CNNs) by eliminating redundant filters. However, existing works suffer from expensive algorithm costs in multi-hardware deployment scenarios involving several budgets across multiple hardware devices. To tackle this challenge, we propose a novel all-in-one hardware-oriented compression framework (AHC), which integrates structured pruning and data pruning to rapidly generate vast hardware-efficient models with ultra-low pruning and fine-tuning costs. Specifically, AHC develops a unified hardware-aware pruning (UHP), which rapidly generates numerous hardware-efficient models for several budgets across multiple hardware devices in once pruning process, thereby reducing pruning costs in multi-hardware deployment scenarios. Moreover, AHC proposes a progressive data pruning (PDP), which gradually removes samples that have a negligible impact on enhancing the predictive ability of pruned models, thereby accelerating the fine-tuning process with negligible performance loss. Extensive experiments demonstrate the superiority of the AHC over state-of-the-art (SOTA) structured pruning methods in terms of algorithm costs, latency, and accuracy. In particular, compared with SOTA hardware-oriented pruning method, AHC achieves comparable performances while reducing $5.3\times $ pruning costs and $2.7\times $ fine-tuning costs in multi-hardware deployment scenarios. Code is available at https://github.com/HXuan-Wang/AHC.

External IDs:dblp:journals/tcsv/WangLFTZCJC24