Detecting Source Contextual Barriers for Understanding Neural Machine Translation

Guanlin Li, Lemao Liu, Conghui Zhu, Rui Wang, Tiejun Zhao, Shuming Shi

2021 (modified: 30 Nov 2021)IEEE ACM Trans. Audio Speech Lang. Process. 2021Readers: Everyone

Abstract: In machine translation evaluation, the traditional wisdom measures model's generalization ability in an average sense, for example by using corpus BLEU. However, the statistics of corpus BLEU cannot provide comprehensive understanding and fine-grained analysis on model's generalization ability. As a remedy, this paper attempts to understand NMT at fine-grained level, by detecting contextual barriers within an unseen input sentence that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cause</i> the degradation in model's translation quality. It proposes a principled definition of source contextual barriers as well as its modified version which is tractable in computation and operates at word-level. Based on the modified one, three simple methods are proposed for barrier detection by search-aware risk estimation through counterfactual generation. Extensive analyses are conducted on those detected contextual barrier words on both Zh <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\Leftrightarrow$</tex-math></inline-formula> En NIST benchmarks. Potential usages motivated from barrier words are also discussed.

0 Replies