A Contrastive Self-distillation BERT with Kernel Alignment-Based Inference

Yangyan Xu, Fangfang Yuan, Cong Cao, Majing Su, Yuhai Lu, Yanbing Liu

Published: 2023, Last Modified: 20 Mar 2026ICCS (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Early exit, as an effective method to accelerate pre-trained language models, has recently attracted much attention in the field of natural language processing. However, existing early exit methods are only suitable for low acceleration ratios due to two reasons: (1) The shallow classifiers in the model lack semantic information. (2) Exit decisions in the intermediate layers are unreliable. To address the above issues, we propose a Contrastive self-distillation BERT with kernel alignment-based inference (CsdBERT), which aims to let shallow classifiers learn deep semantic knowledge to make comprehensive predictions. Specifically, we classify the early exit classifiers into teachers and students based on classification loss to distinguish the representation ability of the classifiers. Firstly, we present a contrastive learning approach between teacher and student classifiers to maintain the consistency of class similarity between them. Then, we introduce a self-distillation strategy between these two kinds of classifiers to solidify learned knowledge and accumulate new knowledge. Finally, we design a kernel alignment-based exit mechanism to identify samples of different difficulty for accelerating BERT inference. Experimental results on the GLUE and ELUE benchmarks show that CsdBERT not only achieves state-of-the-art performance, but also maintains \(95\%\) performance at \(4\times \) speed.
Loading