Comparative Quality Assessment of Human and Machine Translation with Best-Worst Scaling

Bettina Hiebl; Dagmar Gromann

Comparative Quality Assessment of Human and Machine Translation with Best-Worst Scaling

Bettina Hiebl, Dagmar Gromann

Published: 01 Jan 2024, Last Modified: 20 Feb 2025EAMT (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Translation quality and its assessment are of great importance in the context of human as well as machine translation. Methods range from human annotation and assessment to quality metrics and estimation, where the former are rather time-consuming. Furthermore, assessing translation quality is a subjective process. Best-Worst Scaling (BWS) represents a time-efficient annotation method to obtain subjective preferences, the best and the worst in a given set and their ratings. In this paper, we propose to use BWS for a comparative translation quality assessment of one human and three machine translations to German of the same source text in English. As a result, ten participants with a translation background selected the human translation most frequently and rated it overall as best closely followed by DeepL. Participants showed an overall positive attitude towards this assessment method.

Loading