Evaluating Machine Translation Systems with Second Language Proficiency Tests

Takuya Matsuzaki, Akira Fujita, Naoya Todo, Noriko H. Arai

2015 (modified: 16 Jul 2019)ACL (2) 2015Readers: Everyone

Abstract: A lightweight, human-in-the-loop evaluation scheme for machine translation (MT) systems is proposed. It extrinsically evaluates MT systems using human subjects’ scores on second language ability test problems that are machine-translated to the subjects’ native language. A largescale experiment involving 320 subjects revealed that the context-unawareness of the current MT systems severely damages human performance when solving the test problems, while one of the evaluated MT systems performed as good as a human translation produced in a context-unaware condition. An analysis of the experimental results showed that the extrinsic evaluation captured a different dimension of translation quality than that captured by manual and automatic intrinsic evaluation.

0 Replies