Hardware-Friendly Acceleration for Deep Neural Networks with Micro-Structured Compression

Mengshu Sun, Sheng Lin, Shan Liu, Songnan Li, Yanzhi Wang, Wei Jiang, Wei Wang

Published: 01 Jan 2022, Last Modified: 10 Nov 2023FCCM 2022Readers: Everyone

Abstract: Deep Neural Network (DNN) compression techniques including weight pruning and quantization have made great success in reducing the amount of model parameters and computations for various applications. However, the existing studies hardly consider two critical targets jointly, i.e., enhancing the computation and resource utilization efficiency that is essential for DNN acceleration on hardware, and at the same time maintaining the original model performance, such as the accuracy in classification tasks, or the peak signal-to-noise ratio (PSNR) in super resolution tasks. Approaches like coarse-grained structured (filter, channel, etc.) pruning and low-precision (binary, ternary, fixed-point with 4-bit or less) quantization suffer from non-negligible accuracy loss, and unstructured pruning incurs extra indexing overhead and degradation in computation parallelism.

0 Replies