Abstract: Deep neural networks (DNNs) typically have a single exit point that makes predictions by running the en-
tire stack of neural layers. Since not all inputs require the same amount of computation to reach a confident
prediction, recent research has focused on incorporating multiple “exits” into the conventional DNN archi-
tecture. Early-exit DNNs are multi-exit neural networks that attach many side branches to the conventional
DNN, enabling inference to stop early at intermediate points. This approach offers several advantages, in-
cluding speeding up the inference process, mitigating the vanishing gradients problems, reducing overfitting
and overthinking tendencies. It also supports DNN partitioning across devices and is ideal for multi-tier
computation platforms such as edge computing. This article decomposes the early-exit DNN architecture
and reviews the recent advances in the field. The study explores its benefits, designs, training strategies, and
adaptive inference mechanisms. Various design challenges, application scenarios, and future directions are
also extensively discussed
Loading