Abstract: This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained or finetuned model on 29 datasets in Danish, Norwegian, Swedish, Icelandic and Faroese, two of which are new. We develop and release a Python package and Command-Line Interface (CLI), scandeval, which can benchmark any model that has been uploaded to the HuggingFace Hub, with reproducible results. Using this package, we benchmark over 60 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard. The benchmarking results shows that the investment in language technology in Norway and Sweden has led to language models that outperform multilingual models such as XLM-RoBERTa and LaBSE. We release the source code for both the package and leaderboard.
Paper Type: long
0 Replies
Loading