Hierarchical Cache Transformer: Dynamic Early Exit for Language Translation

Chih-Shuo Tsai, Ying-Hong Chan, Yao-Chung Fan

Published: 01 Jan 2022, Last Modified: 27 Jun 2023IJCNN 2022Readers: Everyone

Abstract: The transformer model significantly improves the performance of natural language processing tasks. However, the downside of employing the transformer-based model is its heavy inference cost, which raises the concern of putting transformer-based models into industrial operations. Thus, studies for im-proving inference performance were reported. However, the major studies mainly consider classification tasks but not for natural language generation (NLG). In this paper, we propose Hierarchical Cache (HC) Transformer model tailored to NLG tasks. Our experiments show the feasible results on German-English translation dataset. The experiment result demonstrates that HC-Transformer can speed up inference by 32% with a 3% loss in performance.

0 Replies