Translated Benchmarks Can Be Misleading: the Case of Estonian Question AnsweringDownload PDF

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: benchmarks, question-answering, translated datasets
TL;DR: The paper compares native and translated Estonian QA test datasets and discovers that translated test dataset overestimates the performance on native dataset.
Abstract: Translated test datasets are a popular and cheaper alternative to native test datasets. However, one of the properties of translated data is the existence of cultural knowledge unfamiliar to the target language speakers. This can make translated test datasets differ significantly from native target datasets. As a result, we might inaccurately estimate the performance of the models in the target language. In this paper, we use both native and translated Estonian QA datasets to study this topic more closely. We discover that relying on the translated test dataset results in an overestimation of the model's performance on native Estonian data.
Student Paper: Yes, the first author is a student
4 Replies

Loading