Joint or Disjoint: Mixing Training Regimes for Early-Exit Models

Piotr Kubaty; Bartłomiej Tomasz Krzepkowski; Bartosz Wójcik; Monika Michaluk; Franciszek Szarwacki; Tomasz Trzcinski; Jary Pomponi; Kamil Adamczewski

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models

Piotr Kubaty, Bartłomiej Tomasz Krzepkowski, Bartosz Wójcik, Monika Michaluk, Franciszek Szarwacki, Tomasz Trzcinski, Jary Pomponi, Kamil Adamczewski

21 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: early-exit, efficient AI, conditional computation

Abstract: Early exits are an important efficiency mechanism integrated into deep neural networks that allows for the termination of the network's forward pass before processing through all its layers. Early exit methods add trainable internal classifiers which leads to different training dynamics. However, there is no consistent verification of the approaches of training of early exit methods and little understanding how training regimes optimize the architecture. Most early exit methods employ a training strategy that either simultaneously trains the backbone network and the exit heads or trains the exit heads separately. We propose a training approach where the backbone is initially trained on its own, followed by a phase where both the backbone and the exit heads are trained together. Thus, we categorize early-exit training strategies into three distinct categories, and then validate them for their performance and efficiency. In this benchmark, we perform both theoretical and empirical analysis of early-exit training regimes. We study the methods in terms of information flow, loss landscape and numerical rank of activations and gauge the suitability of regimes for various architectures and datasets.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2377

Loading