A Framework for Asymmetrical DNN Modularization for Optimal Loading

Brijraj Singh, Yash Jain, Mayukh Das, Praveen Doreswamy Naidu

Published: 01 Jan 2021, Last Modified: 17 May 2023IJCNN 2021Readers: Everyone

Abstract: The modern era of artificial intelligence is mostly driven by Deep Neural Network (DNN). As a result most of the intelligent/smart apps running on edge devices (mobile phones, televisions etc.) use DNN for their predictive ability. DNN based prediction suffers with operational overheads, which is the summation of model loading latency and inference latency. Model loading latency affects the first response of the DNN powered apps, whereas inference latency affects the subsequent responses. As apps switching has become a common practice among edge device users, so it is of utmost interest to make the switching smooth by reducing the model loading latency. In this paper asymmetrical DNN modularization is proposed as a potential solution. The proposed method solves two distinct problems (a). Improves the model loading latency by parallel loading of all the modules (child models) of given DNN model. (b). Helps in on-device training by keeping the live gradients only for last child model. The decision about modularization index and their corresponding positions are taken by reinforcement learning unit (RLU). RLU takes into account the available hardware resources on-device (eg. h/w threads), loading latency of each layer on dedicated compute units. In response, it provides the best modularization index k and their corresponding positions <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\vec{p}$</tex> specific to the DNN model and device, where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$P=(p_{1},p_{2},\ \ldots,p_{k})$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$p_{i}$</tex> is the end position of child <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$i$</tex> . The proposed method has shown significant loading time improvement (up to 7X) on popular DNNs, used for camera use-cases. Along with improving the loading latency the proposed modularization method facilitates for On-device personalization by separating the module with trainable layers and loading them particularly while training on-device.

0 Replies