Abstract: State of art deep neural network (DNN) models are both memory prohibitive and computationally intensive with millions of connections. Employing these models for an embedded mobile application is resource limited with large amount of power consumption and significant bandwidth requirement (to access the data from the external DRAM). In a custom FPGA hardware the bandwidth access from the DRAM is two to three times higher, compared to the MAC (Multiply-Accumulate) operation. In this paper, we propose a power efficient multi-bit neural network accelerator, where we employ the technique of truncating the partial sum (PSum) results from the previous layer before feeding it into the next layer. We demonstrate that, using our multi-bit accelerator, accuracy is maintained upto bit width of 12. The proposed truncation scheme has 50% power reduction and resource utilization was reduced by 16% for LUTs (Look-up tables), 9% for FFs (Flip-Flops), 19% for BRAMs (Block RAMs) and 7% for Digital Signal Processors (DSPs) when compared with the 32 bits architecture. A large network, AlexNet was used as a benchmark DNN model and Kintex-7 KC705 FPGA was used to test the architecture.
Loading