NorQuAD: Norwegian Question Answering DatasetDownload PDF

Published: 20 Mar 2023, Last Modified: 21 Apr 2024NoDaLiDa 2023Readers: Everyone
Keywords: question answering, extractive question answering, question answering dataset, machine reading comprehension, Norwegian
TL;DR: The first Norwegian question answering dataset for machine reading comprehension, with over 4700 manually created question-answer pairs.
Abstract: In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.
Student Paper: Yes, the first author is a student
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/arxiv:2305.01957/code)
4 Replies

Loading