EEformer: Early Exiting for Transformer with Global-Local Exits and Progressive Fine-Tuning

Guanyu Xu, Jiawei Hao, Yong Luo, Li Shen, Han Hu, Dan Zeng

Published: 01 Jan 2025, Last Modified: 25 Jan 2026IEEE Transactions on MultimediaEveryoneRevisionsCC BY-SA 4.0
Abstract: Recently, the efficient deployment and acceleration of transformer-based pre-trained models (TPMs) on resource-constrained edge devices for multimedia services have gained significant interest. Although early exiting is a feasible solution, it may lead to extra computational cost and substantial performance degradation compared to the original models. To tackle these issues, we propose a framework termed EEformer, which incorporates global-local heads (GLHs) into intermediate layers to construct the early exiting dynamic neural network (EDNN). The GLH can efficiently extract global and local information from hidden states produced by the backbone layer, thereby achieving a better performance-efficiency trade-off for the EDNN. Moreover, we propose a novel progressive fine-tuning strategy to steadily improve the efficiency of the EDNN while maintaining its performance comparable to the original mode through three fine-tuning stages. We conduct extensive experiments on image classification and natural language processing tasks, demonstrating the superiority of the proposed framework. In particular, the proposed framework achieves 1.87× speed-up while maintaining 99.0% performance on the CIFAR-100 dataset, and 3.05× speed-up while maintaining 98.5% performance on the SST-2 dataset.
Loading