Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation

Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

2019 (modified: 23 Nov 2022)ICS 2019Readers: Everyone

Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with a field-programmable gate array (FPGA) implementation. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization, and inference accuracy. An FPGA implementation is used as the validation vehicle for our design, achieving a 2.28ms inference latency for the ImageNet benchmark. Our implementation shines in that it has 9x higher energy efficiency compared to other implementations while achieving comparable latency. A highlight of our approach which contributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing CNNs with powers-of-two weights. Compared to an FPGA implementation for a traditional 8-bit MAC, SAC substantially reduces required hardware resources (4.85x fewer lookup tables) and power consumption (2.48x).

0 Replies