- Abstract: In Machine Translation (MT), taking into account information related to the setting in which a text is produced can be crucial. We investigate the impact of different extra-linguistic factors (speaker gender, speaker age, film genre and film year) on the MT of subtitles. Our starting point is the pseudo-token approach (Sennrich, 2016). We explore the simultaneous addition of multiple factors in various orders in order to assess the limits of treating these factors as a sequence of words. We compare this approach to the encoding of the same factors using an additional, separate encoder. We evaluate both using BLEU and a targeted evaluation of how well the context is used. Our results show that both strategies are well adapted to exploiting such context. Contrary to our intuitions, the pseudo-token approach appears unperturbed by the use of multiple values in various orders, and also results in significant improvements in BLEU score (p<0.01). The multi-encoder approach proves more effective at integrating context but results in lower overall BLEU scores.
- Keywords: machine translation, context