- Keywords: Deep learning, natural language processing, Machine translation
- TL;DR: Improved training of wait-k decoders for online machine translation
- Abstract: Neural sequence-to-sequence models are at the basis of state-of-the-art solutions for sequential prediction problems such as machine translation and speech recognition. The models typically assume that the entire input is available when starting target generation. In some applications, however, it is desirable to start the decoding process before the entire input is available, e.g. to reduce the latency in automatic speech recognition. We consider state-of-the-art wait-k decoders, that first read k tokens from the source and then alternate between reading tokens from the input and writing to the output. We investigate the sensitivity of such models to the value of k that is used during training and when deploying the model, and the effect of updating the hidden states in transformer models as new source tokens are read. We experiment with German-English translation on the IWSLT14 dataset and the larger WMT15 dataset. Our results significantly improve over earlier state-of-the-art results for German-English translation on the WMT15 dataset across different latency levels.