Automatic Animacy Classification for Romanian Nouns

Published: 01 Jan 2024, Last Modified: 07 Jun 2024LREC/COLING 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce the first Romanian animacy classifier, specifically a type-based binary classifier of Romanian nouns into the classes human/non-human, using pre-trained word embeddings and animacy information derived from Romanian WordNet. By obtaining a seed set of labeled nouns and their embeddings, we are able to train classifiers that generalize to unseen nouns. We compare three different architectures and observe good performance on classifying word types. In addition, we manually annotate a small corpus for animacy to perform a token-based evaluation of Romanian animacy classification in a naturalistic setting, which reveals limitations of the type-based classification approach.
Loading