Abstract: Early exiting models within the transformer architecture have shown to increase efficiency within simultaneous speech-to-text translation with minimal reduction in accuracy. However, current encoder based implementations evaluate inputs on a sequence level basis, averaging the computational needs of each token within the sequence. In addition, models that exit on a per-token basis are implemented in the decoder and use a limited amount of information to determine if exiting should take place. We solve this issue by purposing Per Token Early Exiting, which creates one-layer neural networks between layers in the encoder that recognize when certain tokens should exit and which tokens should be further processed. Our experiments on the MUST-C English-German, and English-Spanish, data sets have shown to increase the BLEU score and/or decrease FLOP's during evaluation across multiple wait-k values. On the English-German language pair at wait-k 7 the proposed model increased the BLEU score by 1.74 compared to the baseline implementation. In addition, across all wait-k values the proposed model decreased the average FLOPs by 13.28% from the baseline.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, German, and Spanish
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading