A New Benchmark for Kalaallisut-Danish Neural Machine Translation.

Ross Deans Kristensen-McLachlan, Johanne Sofie Krog Nedergård

Published: 01 Jan 2024, Last Modified: 19 Jan 2026Proceedings of the 4th Workshop on NLP for Indigenous Languages of the AmericasEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Kalaallisut, also known as (West) Greenlandic, poses a number of unique challenges to contemporary natural language processing (NLP). In particular, the language has historically lacked benchmarking datasets and robust evaluation of specific NLP tasks, such as neural machine translation (NMT). In this paper, we present a new benchmark dataset for Greenlandic to Danish NMT comprising over 1.2m words of Greenlandic and 2.1m words of parallel Danish translations. We provide initial metrics for models trained on this dataset and conclude by suggesting how these findings can be taken forward to other NLP tasks for the Greenlandic language.

External IDs:doi:10.18653/v1/2024.americasnlp-1.7