The Metrics Maze: Navigating the Landscape of Evaluation Techniques for MT Systems

Robin Guillot, Henri Upton

22 Mar 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Over the years, more and more untrained text similarity metrics have emerged in a context where tasks are becoming increasingly numerous, such as text summarization, story writing, or translation. Compared to trained metrics, they are independent of a training set and must perform well in any context. Therefore, it becomes increasingly important to compare these metrics in order to clearly identify which ones perform the best. In this paper, we focus on the task of Machine Translation (MT) and benchmark the sentence-level correlation of the main existing metrics with human scores, using the WMT22 (World Machine Translation) dataset.

0 Replies