UBERT: Unsupervised adaptive early exits in BERT

Divya Jyoti Bajpai; Manjesh Kumar Hanawal

UBERT: Unsupervised adaptive early exits in BERT

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Early exits, Deep Neural Networks, BERT

TL;DR: Adaptive threshold learning to decide early exits during inference.

Abstract: Inference latency is an issue in pre-trained networks like BERT due to their large size. To overcome this, side branches are attached at the intermediary layers with provision for early inference instead of inference only at the last layer. This facilitates the early exit of 'easy' samples and requires only 'hard' samples to pass through all layers, thus reducing inference latency. However, the hardness of the samples is unknown a priori. This leads to the question of how to exit so that the accuracy and latency are well balanced. Also, the optimal choice of parameters involved in deciding exits can depend on the sample domain and hence need to be adapted. We develop an online learning algorithm named UBERT to decide if a sample can exit early. The decisions are based on confidence in inference exceeding a threshold at each exit point, and the algorithm simultaneously learns the optimal thresholds for all the exits. UBERT learns the optimal threshold for the sample domain using confidence observed at the intermediary layers without requiring any ground truth labels. We perform extensive experiments on five datasets with one and two early exits. We compare the performance against the case with no early exits, i.e., all samples exit at the last layer. UBERT achieves a 10\%-53\% reduction in time with a drop in accuracy in the range of 0.3\% - 5.7\% with one early exit. For the case with two exits, the time reduction increases to 32\%-70\% with only a marginal drop in accuracy of 0.1\%-3.9\%. The anonymized source code is available at https://anonymous.4open.science/r/UBERT-F2DF/README.md.

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9356

Loading