SelfXit: An Unsupervised Early Exit Mechanism for Deep Neural Networks

Published: 31 Oct 2024, Last Modified: 31 Oct 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep Neural Networks (DNNs) have become an essential component in many application domains, including web-based services. A variety of these services require high throughput and (close to) real-time features, for instance, to respond or react to users' requests or to process a stream of incoming data on time. However, the trend in DNN design is towards larger models with many layers and parameters to achieve more accurate results. Although these models are often pre-trained, the computational complexity in such large models can still be relatively significant, hindering low inference latency. In this paper, we propose SelfXit, an end-to-end automated early exiting solution to improve the performance of DNN-based vision services in terms of computational complexity and inference latency. SelfXit adopts the ideas of self-distillation of DNN models and early exits specifically for vision applications. The proposed solution is an automated unsupervised early exiting mechanism that allows early exiting of a large model during inference time if the early exit model in one of the early exits is confident enough for final prediction. One of the main contributions of this paper is that we have implemented the idea as an unsupervised early exiting, meaning that the early exit models do not need access to training data and perform solely based on the incoming data at run-time, making it suitable for applications using pre-trained models. The results of our experiments on two vision tasks (image classification and object detection) show that, on average, early exiting can reduce the computational complexity of these services up to 58% (in terms of FLOP count) and improve their inference latency up to 46% with a low to zero reduction in accuracy. SelfXit also outperforms existing methods, particularly on complex models and larger datasets. It achieves a notable reduction in latency of 51.6% and 30.4% on CIFAR100/Resnet50, with an accompanying increase in accuracy of 2.31% and 0.72\%, on average, compared to GATI and BranchyNet.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=RxJhDyegpU&noteId=RxJhDyegpU
Changes Since Last Submission: For the camera-ready version, we have addressed all the minor revisions and removed the highlights. Additionally, we have created a code repository for the project.
Code: https://github.com/hoseinkhs/AutoCacheLayer/
Assigned Action Editor: ~Evan_G_Shelhamer1
Submission Number: 2748
Loading