SWIS - Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Shurui Li; Wojciech Romaszkan; Alexander Graening; Puneet Gupta

SWIS - Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Shurui Li, Wojciech Romaszkan, Alexander Graening, Puneet Gupta

Published: 07 Feb 2021, Last Modified: 05 May 2023tinyML 2021 RegularReaders: Everyone

Keywords: machine learning, accelerators, neural networks, quantization

TL;DR: SWIS - a methodology for more efficient neural network quantization and execution on bit-serial accelerators.

Abstract: Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 52% (19.8%) point accuracy improvement when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6X speedup and 1.8X energy improvement over state of the art bit-serial architectures.

5 Replies

Loading