Abstract: This report describes the KU Leuven / Brepols-CTLO submission to EvaLatin 2022. We present the results of our current
small Latin ELECTRA model, which will be expanded to a larger model in the future. For the lemmatization task, we
combine a neural token-tagging approach with the in-house rule-based lemma lists from Brepols’ ReFlex software. The results
are decent, but suffer from inconsistencies between Brepols’ and EvaLatin’s definitions of a lemma. For POS-tagging, the
results come up just short from the first place in this competition, mainly struggling with proper nouns. For morphological
tagging, there is much more room for improvement. Here, the constraints added to our Multiclass Multilabel model were often
not tight enough, causing missing morphological features. We will further investigate why the combination of the different
morphological features, which perform fine on their own, leads to issues.
Loading