SWIS - Shared Weight bIt Sparsity for Efficient Neural Network AccelerationDownload PDF

Dec 21, 2020 (edited Feb 26, 2021)tinyML 2021 RegularReaders: Everyone
  • Keywords: machine learning, accelerators, neural networks, quantization
  • TL;DR: SWIS - a methodology for more efficient neural network quantization and execution on bit-serial accelerators.
  • Abstract: Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 52% (19.8%) point accuracy improvement when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6X speedup and 1.8X energy improvement over state of the art bit-serial architectures.
5 Replies