Abstract: Deep neural networks have demonstrated unprecedented results on core AI and computer vision tasks. They are typically executed on general purpose GPUs with large form factors and high power consumption, unsuitable for mobile deployment. We present a VLSI architecture that is able to execute quantized, low precision convolution neural networks (CNNs). Compared to high precision, our approach significantly reduces power consumption from memory access and increases processing speed at limited area budget, making it particularly suitable for mobile applications. We propose a dataflow with high data reuse rate specially designed for quantized models. To fully utilize low precision data, we also design a microarchitecture for subword parallel computing of low bit-length data, an on-chip memory hierarchy and data realignment flow for power saving and avoiding buffer bank-conflicts, and finally a corresponding processing element (PE) array. The architecture is highly flexible to suit various CNNs and re-configurable for low bit-length quantized models. We have implemented the proposed VLSI architecture in the TSMC 90nm cell library. At a hardware cost of 180KB on-chip memory and 1,340k logic gate counts, the implementation result shows state-of-the-art hardware efficiency.
0 Replies
Loading