Abstract: Lightweight convolutional neural network (CNN) on tiny embedded platforms can offer energy efficient solution for today's IoT devices. However, CNN implementation on embedded system faces processing bottleneck in convolutional layers and memory storage issues in fully connected layers. In past years, heterogeneous acceleration, where compute intensive tasks are performed on kernel specific cores, has gained attention. In this paper we propose, a domain specific and programmable accelerator “PACENet”-Programmable many-core ACcElerator for convolution neural Network architecture. It consists of neural network kernel specific instruction set architecture such as convolution, maxpool and relu. To demonstrate efficiency of the proposed PACENet, we implemented ResNet-20 for CIFAR-10 dataset, where PACENet performs convolution layer, Relu activations, Maxpool layer, and fully-connected layer. We also implemented ResNet-20 for CIFAR-10 dataset on NVIDIA TX1 mobile GPU platform using Tensorflow and cuDNN libraries. Compared to NVIDIA TX1 platform implementation PACENet platform implementation performs 1.4× to 4.5× faster and saves 2.8× to 9× energy consumption respectively. PACENet achieves 2.9× to 9.3× higher throughput per watt as compared to TX1 platform implementation.
0 Replies
Loading