Estonian Named Entity Recognition: New Datasets and Models

Kairit Sirts

Estonian Named Entity Recognition: New Datasets and Models

Kairit Sirts

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone

Keywords: named entity recognition, NER, Estonian language

TL;DR: This paper describes the annotation of two NER datasets for Estonian and the experimental results on these datasets using a transformer-based model.

Abstract: This paper presents the annotation process of two Estonian named entity recognition (NER) datasets, involving the creation of annotation guidelines for labeling eleven different types of entities. In addition to the commonly annotated entities such as person names, organization names, and locations, the annotation scheme encompasses geopolitical entities, product names, titles/roles, events, dates, times, monetary values, and percents. The annotation was performed on two datasets, one involving reannotating an existing NER dataset primarily composed of news texts and the other incorporating new texts from news and social media domains. Transformer-based models were trained on these annotated datasets to establish baseline predictive performance. Our findings indicate that the best results were achieved by training a single model on the combined dataset, suggesting that the domain differences between the datasets are relatively small.

4 Replies

Loading