Integrating and Evaluating Extra-linguistic Context in Neural Machine TranslationDownload PDF


27 Jan 2019 (modified: 28 Jun 2019)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone
  • Abstract: In Machine Translation (MT), taking into account information related to the setting in which a text is produced can be crucial. We investigate the impact of different extra-linguistic factors (speaker gender, speaker age, film genre and film year) on the MT of subtitles. Our starting point is the pseudo-token approach (Sennrich, 2016). We explore the simultaneous addition of multiple factors in various orders in order to assess the limits of treating these factors as a sequence of words. We compare this approach to the encoding of the same factors using an additional, separate encoder. We evaluate both using BLEU and a targeted evaluation of how well the context is used. Our results show that both strategies are well adapted to exploiting such context. Contrary to our intuitions, the pseudo-token approach appears unperturbed by the use of multiple values in various orders, and also results in significant improvements in BLEU score (p<0.01). The multi-encoder approach proves more effective at integrating context but results in lower overall BLEU scores.
  • Keywords: machine translation, context
0 Replies