Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

Mehrnaz Mofakhami; Reza Bayat; Ioannis Mitliagkas; Joao Monteiro; Valentina Zantedeschi

Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

Mehrnaz Mofakhami, Reza Bayat, Ioannis Mitliagkas, Joao Monteiro, Valentina Zantedeschi

27 Sept 2024 (modified: 22 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient Inference, Early Exiting, Performance Control, Calibration, Classification

TL;DR: We provid a method for performance control in early exiting and show that larger models coupled with early exiting using this method can achieve lower prediction errors for the same computational budget as smaller models.

Abstract: Early Exiting (EE) is a promising technique for speeding up inference at the cost of limited performance loss. It adaptively allocates compute budget to data points based on their difficulty by exiting at earlier layers when predictions are confident. In this study, we first present a novel perspective on the EE approach, demonstrating that larger models, when deployed with EE, can achieve higher performance than smaller models while maintaining similar computational costs. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute/performance trade-off. We introduce Performance Control Early Exiting (PCEE), a method that enables accuracy thresholding by basing decisions not on a datapoint's condfidence but on the average accuracy of samples with similar confidence levels from a held-out validation set. In our experiments with MSDNets and Vision Transformer architectures on CIFAR-10, CIFAR-100, and ImageNet, we show that PCEE offers a simple yet computationally efficient approach that provides better control over performance than standard confidence-based approaches, and allows us to scale up model sizes to yield performance gain while reducing the computational cost.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11866

Loading