Early-Exit Deep Neural Network - A Comprehensive Survey

Haseena Rahmath P, Vishal Srivastava, Kuldeep Chaurasia, Roberto Gonçalves Pacheco, Rodrigo S. Couto

Published: 21 Nov 2024, Last Modified: 27 Jan 2026ACM Computing SurveysEveryoneCC BY 4.0

Abstract: Deep neural networks (DNNs) typically have a single exit point that makes predictions by running the en- tire stack of neural layers. Since not all inputs require the same amount of computation to reach a confident prediction, recent research has focused on incorporating multiple “exits” into the conventional DNN archi- tecture. Early-exit DNNs are multi-exit neural networks that attach many side branches to the conventional DNN, enabling inference to stop early at intermediate points. This approach offers several advantages, in- cluding speeding up the inference process, mitigating the vanishing gradients problems, reducing overfitting and overthinking tendencies. It also supports DNN partitioning across devices and is ideal for multi-tier computation platforms such as edge computing. This article decomposes the early-exit DNN architecture and reviews the recent advances in the field. The study explores its benefits, designs, training strategies, and adaptive inference mechanisms. Various design challenges, application scenarios, and future directions are also extensively discussed