PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early ExitingDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=K_fV_YHD_D
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task \citep{Qiu2020PretrainedMF}, the significant latency during inference prohibits wider industrial usage. In this work, we propose \underline{P}atient and \underline{C}onfident \underline{E}arly \underline{E}xiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer's prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at \url{https://github.com/michael-wzhu/PCEE-BERT}.
Presentation Mode: This paper will be presented virtually
Virtual Presentation Timezone: UTC-8
Copyright Consent Signature (type Name Or NA If Not Transferrable): Zhen Zhang
Copyright Consent Name And Address: Ajou University, Suwon,South Korea
0 Replies

Loading