Hold That Exit: Near Optimal Early-Exit Inference via Recall

Yuanyuan Chloe Yang; Ruimin Zhang; Jamie Heather Morgenstern; Haifeng Xu

Hold That Exit: Near Optimal Early-Exit Inference via Recall

Yuanyuan Chloe Yang, Ruimin Zhang, Jamie Heather Morgenstern, Haifeng Xu

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Efficient inference; Early Exit Models; Recall;

TL;DR: We show that recall is essential for provably efficient early-exit strategies, and identify the optimal strategy reduces to adaptive thresholding with recall under practical settings.

Abstract: Early-exit (EE) models improve the efficiency of deep neural networks by attaching auxiliary classifiers to intermediate layers, enabling predictions before the final layer and reducing inference latency and cost. A central challenge, however, is the principled design of **provably efficient** exit rules—a dimension often underexplored in practice, where simple confidence-thresholding dominates. We provide theoretical guidance for designing such rules. We prove that exit strategies without recall—including standard thresholding—fail to achieve any constant-factor approximation of the optimal accuracy–latency trade-off. To address this, we formalize and analyze **with-recall** strategies, which permit revisiting earlier exits to balance accuracy and efficiency. Our results show that recall is indispensable for provable performance guarantees. Empirical evaluations on computer vision tasks further elucidate the structure of optimal exit rules. In these settings, the optimal strategy reduces to adaptive thresholding with recall, offering a theoretical foundation for the practical deployment of early-exit models.

Submission Number: 52

Loading