Strategies at a glance: A comparative analysis of training techniques for optimizing early-exit deep neural networks

Haseena Rahmath P, Kuldeep Chaurasia, Abhay Bansal

Published: 07 Aug 2025, Last Modified: 27 Jan 2026Neural NetworksEveryoneCC BY 4.0

Abstract: Early-exit deep neural networks (DNNs) enable adaptive inference by allowing predictions at intermediate lay- ers, thereby reducing computational cost. However, their performance is highly sensitive to the chosen train- ing strategy—a factor that remains underexplored. This study presents the first systematic comparison of six prominent strategies—Joint, Separate, Branch-wise, Two-stage, Distillation-based, and Hybrid—across three ar- chitectures (MobileNet, ResNet, VGG) using core benchmarks (CIFAR-10 and CIFAR-100). To evaluate scalabil- ity and domain generalization, we extended experiments on ImageNet-100 and ChestX-ray14. For each setup, we assess convergence behavior, accuracy, overfitting, and training efficiency, supported by statistical valida- tion via ANOVA and Tukey’s HSD tests. Results reveal key trade-offs: Joint and Distillation-based strategies offer strong generalization but incur higher computational cost; Two-stage and Branch-wise are prone to over- fitting at deeper exits; Separate training underperforms at early exits. In contrast, Hybrid strategies achieve the best balance of accuracy and efficiency. These insights offer practical guidance for optimizing early-exit DNNs under resource constraints and lay a principled foundation for future research on efficient training paradigms