Is Part-of-Speech Tagging a Solved Problem for Icelandic?Download PDF

Published: 20 Mar 2023, Last Modified: 17 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: Part-of-Speech Tagging, Icelandic, Transformer, ConvBERT, error analysis, annotator disagreement, annotation errors
TL;DR: After two decades of POS tagging for Icelandic we are nearing the finish line.
Abstract: We train and evaluate four Part-of-Speech tagging models for Icelandic. Three are older models that obtained the highest accuracy for Icelandic when they were introduced. The fourth model is of a type that currently reaches state-of-the-art accuracy. We use the most recent version of the MIM-GOLD training/testing corpus, its newest tagset, and augmentation data to obtain results that are comparable between the various models. We examine the accuracy improvements with each model and analyse the errors produced by our transformer model, which is based on a previously published ConvBERT model. For the set of errors that all the models make, and for which they predict the same tag, we extract a random subset for manual inspection. Extrapolating from this subset, we obtain a lower bound estimate on annotation errors in the corpus as well as on some unsolvable tagging errors. We argue that further tagging accuracy gains for Icelandic can still be obtained by fixing the errors in MIM-GOLD and, furthermore, that it should still be possible to squeeze out some small gains from our transformer model.
Student Paper: Yes, the first author is a student
4 Replies

Loading