Abstract: In the realm of Convolutional Neural Networks (CNNs), convolution operations exhibit a multiple degree of data reuse. However, the current Weight Stationary (WS) can’t adequately exploit the potential local data reuse. In addition, the local data reuse of the Systolic Array (SA) implementing WS is limited by array size. To address this issue, we propose a novel dataflow called Enhanced Weight Stationary (EWS) for SA-based CNN accelerators with our customized architecture to enhance data reuse. Our approach focuses on expanding the flexibility of weight mapping on Processing Elements (PEs) through the utilization of Weight Register Files (WRF) in PE arrays. Additionally, by incorporating Activation Register Files (ARF) and Partial Sum Register Files (PRF), our accelerator enables the convolutional reuse of input feature maps (ifmaps) and facilitates the reuse of partial sums (psums) during channel-wise accumulation, which can effectively reduce access to on-chip SRAM. Experimental results demonstrate the effectiveness of our CNN accelerator employing the EWS by achieving 1.22- $1.72\times $ throughput and 1.35- $2.48\times $ energy efficiency over WS dataflow as the size of array varies from 16 to 64.
Loading