Do We Need Source Context for Document-level Neural Machine Translation?Download PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts. In this work, we investigate whether source context data could actually be dispensed altogether within a standard concatenation-based approach to context-aware NMT, thus supporting further use of monolingual data without the need for a specific NMT architecture. We propose a simple approach based on prepending context sentences of the target language to both the source sentence to be translated and the target reference sentence. We show that this method can lead to significant improvements over a strong baseline on discourse-level phenomena that depend on target language information, while achieving parity for phenomena where the relevant information is present in both source and target languages. Additionally, we show that target monolingual data can be better exploited via back-translation under this approach, and that the use of machine-translated target context did not significantly impact translation quality overall. We experimented in two language pairs, English-Russian and Basque-Spanish, for which challenge test sets are available on multiple contextual phenomena.
Paper Type: long
Research Area: Machine Translation
Languages Studied: English, Russian, Basque, Spanish
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies

Loading