Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

Mehrnaz Mofakhami; Reza Bayat; Ioannis Mitliagkas; Joao Monteiro; Valentina Zantedeschi

Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

Mehrnaz Mofakhami, Reza Bayat, Ioannis Mitliagkas, Joao Monteiro, Valentina Zantedeschi

Published: 21 Jun 2024, Last Modified: 26 Jul 2024ES-FoMo-II 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Inference, Early Exiting, Performance Control, Calibration, Classification

Abstract: Early Exiting (EE) is a promising technique for speeding up inference at the cost of limited performance loss. It adaptively allocates compute budget to a datapoint based on its difficulty by exiting at earlier layers. In this study, we first present a novel perspective on EE by demonstrating that it should be used to deploy larger models in order to achieve higher performance while maintaining the low computational cost of small models. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute/performance trade-off. We introduce PCEE (Performance Control Early Exiting), a method that ensures a lower bound on accuracy, hence facilitating accurate adaptation of EE methods for practical use. In our experiments with MSDNets and Vision Transformer architectures on CIFAR-10, CIFAR-100, and ImageNet, we show that PCEE offers a simple yet computationally efficient approach that in most cases provides better controllability over performance than standard confidence-based approaches, and, interestingly, allows us to scale up model sizes to yield cost reductions and performance gain.

Submission Number: 86

Loading