E4SA: An Ultra-Efficient Systolic Array Architecture for 4-Bit Convolutional Neural Networks

Published: 01 Jan 2024, Last Modified: 13 Nov 2024FPGA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Many studies have demonstrated that 4-bit precision quantization can achieve comparable accuracy to floating-point DNNs, sparking significant interest in efficiently accelerating compressed DNNs, especially 4-bit convolutions, on edge devices. However, we observe that conventional systolic array (SA) architectures designed for DNNs cannot fully exploit the advantages of high DSP computational density offered by 4-bit DSP packing. Although state-of-the-art FPGA-based SA architectures (e.g., AutoSA) exhibit flexibility in accommodating 4-bit DSP packing, they suffer from resource consumption and data supply latency issues, especially when adapting to various convolution spatial sizes. This work introduces a customizable and ultra-efficient SA architectural template for 4-bit convolution, called E4SA. First, we propose a fine-grained row-temporal weight stationary dataflow that aligns with the specific requirements of 4-bit DSP full packing (4bF packing). Based on this, we design a cost-effective SA unit (SAU) composed of 4bF-packing-based processing elements (PEs) to enhance computational efficiency. This includes column-shared packed-data splitters and shift-register-based feature-map/weight fetchers to ensure continuous data supply, all of which are locally interconnected via more cost-effective registers. In addition, we develop a two-level hierarchy SA that decomposes the original large SA into parallel 4×4 SAU sets, which not only allows multiple PEs in the same column to share data splitting and reorganization logic and thus reducing the LUT overhead, but also maintains near-theoretical latency across various convolutional spatial sizes. Experimental results demonstrate that E4SA achieves up to 576.6 GOPS with 13.8× higher GOPS/DSP efficiency and 51.6× higher GOPS/kLUTs efficiency compared to 4-bit AutoSA-based design.
Loading