Superlim: A Swedish Language Understanding Evaluation Benchmark

Aleksandrs Berdicevskis; Gerlof Bouma; Robin Kurtz; Felix Morger; Joey Öhman; Yvonne Adesam; Lars Borin; Dana Dannélls; Markus Forsberg; Tim Isbister; Anna Lindahl; Martin Malmsten; Faton Rekathati; Magnus Sahlgren; Elena Volodina; Love Börjeson; Simon Hengchen; Nina Tahmasebi

Superlim: A Swedish Language Understanding Evaluation Benchmark

Aleksandrs Berdicevskis, Gerlof Bouma, Robin Kurtz, Felix Morger, Joey Öhman, Yvonne Adesam, Lars Borin, Dana Dannélls, Markus Forsberg, Tim Isbister, Anna Lindahl, Martin Malmsten, Faton Rekathati, Magnus Sahlgren, Elena Volodina, Love Börjeson, Simon Hengchen, Nina Tahmasebi

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Resources and Evaluation

Submission Track 2: Interpretability, Interactivity, and Analysis of Models for NLP

Keywords: Swedish, benchmark, large language models, natural language understanding, transfer learning, evaluation

Abstract: We present Superlim, a multi-task NLP benchmark and analysis platform for evaluating Swedish language models, a counterpart to the English-language (Super)GLUE suite. We describe the dataset, the tasks, the leaderboard and report the baseline results yielded by a reference implementation. The tested models do not approach ceiling performance on any of the tasks, which suggests that Superlim is truly difficult, a desirable quality for a benchmark. We address methodological challenges, such as mitigating the Anglocentric bias when creating datasets for a less-resourced language; choosing the most appropriate measures; documenting the datasets and making the leaderboard convenient and transparent. We also highlight other potential usages of the dataset, such as, for instance, the evaluation of cross-lingual transfer learning.

Submission Number: 1083

Loading