When Does Translation Require Context? A Data-driven, Multilingual ExplorationDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Although proper handling of discourse phenomena significantly contributes to the quality of machine translation (MT), improvements on these phenomena are not adequately measured in common translation quality metrics. Recent works in context-aware MT attempt to target a small set of these phenomena during evaluation. In this paper, we propose a methodology to identify translations that require context systematically, and use this methodology to both confirm the difficulty of previously studied phenomena as well as uncover new ones that have not been addressed in previous work. We then develop the \textbf{Mu}ltilingual \textbf{D}iscourse-\textbf{A}ware (MuDA) benchmark, a series of taggers for these phenomena in 14 different language pairs, which we use to evaluate context-aware MT. We find that state-of-the-art context-aware MT models make marginal improvements over context-agnostic models, which suggests current models do not handle these ambiguities effectively. We will release code and data to invite the MT research community to increase efforts on context-aware translation on discourse phenomena and languages that are currently overlooked.
0 Replies

Loading