Design and Develop Hardware Aware DNN for Faster Inference

S. Rajarajeswari, Annapurna P. Patil, Aditya Madhyastha, Akshat Jaitly, Himangshu Shekhar Jha, Sahil Rajesh Bhave, Mayukh Das, N. S. Pradeep

Published: 01 Jan 2022, Last Modified: 17 May 2023IntelliSys (3) 2022Readers: Everyone

Abstract: On many small-scale devices, advanced learning models have become standard. The necessity of the hour is to reduce the amount of time required for inference. This study describes a pipeline for automating Deep Neural Network customization and reducing neural network inference time. This paper presents a hardware-aware methodology in the form of a sequential pipeline for shrinking the size of deep neural networks. MorphNet is used at the pipeline’s core to iteratively decrease and enlarge a network. Upon the activation of layers, a resource-weighted sparsifying regularizer is used to identify and prune inefficient neurons, and all layers are then expanded using a uniform multiplicative factor. This is followed by fusion, a technique for combining the frozen batch normalization layer with the preceding convolution layer. Finally, the DNN is retrained after customization using a Knowledge Distillation approach to maintain model accuracy performance. The approach shows promising initial results on MobileNetv1 and ResNet50 architectures.

0 Replies