TESLA: Task-wise Early Stopping and Loss Aggregation for Dynamic Neural Network Inference


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: For inference operations in deep neural networks on end devices, it is desirable to deploy a single pre-trained neural network model, which can dynamically scale across a computation range without comprising accuracy. To achieve this goal, Incomplete Dot Product (IDP) has been proposed to use only a subset of terms in dot products during forward propagation. However, there are some limitations, including noticeable performance degradation in operating regions with low computational costs, and essential performance limitations since IDP used hand-crafted profile coefficients. In this paper, we extend IDP by proposing new training algorithms involving a single profile, which may be trainable or pre-determined, to significantly improve the overall performance, especially in operating regions with low computational costs. Specifically, we propose the Task-wise Early Stopping and Loss Aggregation (TESLA) algorithm, which is showed in our 3-layer multilayer perceptron on MNIST that achieves 94.7% accuracy on average, and especially outperforms the original IDP by 32% when only 10% of dot products we used. We further introduce trainable profile coefficients, with which TESLA further improves the accuracy to 95.5% without specifying coefficients in advance. We also apply TESLA to the VGG-16 model on CIFAR-10 and it achieves 80% accuracy using only 20% of dot products, where the original IDP does not work. Finally, we visualize the learned representations at different dot product percentages by class activation map and show that, by applying TESLA, the learned representations can adapt over a wide range of operation regions.