Abstract: The transformer model significantly improves the performance of natural language processing tasks. However, the downside of employing the transformer-based model is its heavy inference cost, which raises the concern of putting transformer-based models into industrial operations. Thus, studies for im-proving inference performance were reported. However, the major studies mainly consider classification tasks but not for natural language generation (NLG). In this paper, we propose Hierarchical Cache (HC) Transformer model tailored to NLG tasks. Our experiments show the feasible results on German-English translation dataset. The experiment result demonstrates that HC-Transformer can speed up inference by 32% with a 3% loss in performance.
0 Replies
Loading