Principal Parts Detection for Computational Morphology: Task, Models and Benchmark

Published: 24 May 2025, Last Modified: 24 May 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Principal Parts Detection, Computational Morphology, Inflectional Paradigms, Paradigm Completion, Morphological Modeling, Unsupervised Learning, Clustering Methods, Morphological Inflection, Morphological Reinflection, Linguistic Typology, Natural Language Processing (NLP)
TL;DR: We introduce a computational framework for detecting principal parts in inflectional paradigms, leveraging linguistic insights and modeling techniques across ten typologically diverse languages.
Abstract: Principal parts of an inflectional paradigm, defined as the minimal set of paradigm cells required to deduce all others, constitute an important concept in theoretical morphology. This concept, which outlines the minimal memorization needed for a perfect inflector, has been largely overlooked in computational morphology despite impressive advances in the field over the last decade. In this work, we posit Principal Parts Detection as a computational task and construct a multilingual dataset of verbal principal parts covering ten languages, based on Wiktionary entries. We evaluate an array of Principal Parts Detection methods, all of which follow the same schema: characterize the relationships between each pair of inflectional categories, cluster the resulting vector representations, and select a representative of each cluster as a predicted principal part. Our best-performing model, based on Edit Script between inflections and using Hierarchical K-Means, achieves an F1 score of 55.05\%, significantly outperforming a random baseline of 21.20\%. While our results demonstrate that some success is achievable, further work is needed to thoroughly solve Principal Parts Detection, a task that may be used to further optimize inputs for morphological inflection, and to promote research into the theoretical and practical importance of a compact representation of morphological paradigms.
Submission Number: 130
Loading