Reinforcement Learning Based Asymmetrical DNN Modularization for Optimal Loading

Sep 28, 2020 (edited Sep 23, 2021)ICLR 2021 Conference Blind SubmissionReaders: Everyone
  • Reviewed Version (pdf):
  • Keywords: DNN Compression, Loading time
  • Abstract: Latency of DNN (Deep Neural Network) based prediction is the summation of model loading latency and inference latency. Model loading latency affects the first response from the applications, whereas inference latency affects the subsequent responses. As model loading latency is directly proportional to the model size, this work aims at improving the response time of an intelligent app by reducing the loading latency. The speedup is gained by asymmetrically modularizing the given DNN model among several small child models and loading them in parallel. The decision about number of feasible child models and their corresponding split positions are taken care by reinforcement learning unit (RLU). RLU takes into account the available hardware resources on-device and provides the best splitting index $k$ and their positions $\vec{p}$ specific to the DNN model and device, where $\vec{p}=(p_1, p_2, ..., p_k)$ and $p_i$ is the end position of $i^{th}$ child: $M_i$. The proposed method has shown significant loading improvement (up to 7X) on popular DNNs, used for camera use-case. The proposed method can be used to speed up the app response. Along with that RLU driven approach facilitates for On-device personalization by separating one module only with trainable layers and loading that particular module while training on-device.
  • One-sentence Summary: This work proposes an application of reinforcement learning for reducing the DNN loading time.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
5 Replies