Coarse-to-Fine Output Predictions for Efficient Decoding in Neural Machine Translation

Qi Chen, Oi Yee Kwong, Yinqiao Li, Tong Xiao, Jingbo Zhu

Published: 2022, Last Modified: 12 Jun 2024ACM Trans. Asian Low Resour. Lang. Inf. Process. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Neural Machine Translation (NMT) systems are undesirably slow as the decoder often has to compute probability distributions over large target vocabularies. In this work, we propose a coarse-to-fine approach to reduce the complexity of the decoding process, using only the information of the weight matrix in the Softmax layer. The large target vocabulary is first trimmed to a small candidate set in the coarse-grained phase, and from this candidate set the final top-k results are generated in the fine-grained phase. Tested on an RNN-based NMT system and a Transformer-based NMT system separately, our GPU-friendly method achieved a significant speed-up without harming the translation quality.