\textsc{Vorm}: Translations and a constrained hypothesis space support unsupervised morphological segmentation across languages

Published: 24 May 2025, Last Modified: 24 May 2025CoNLL 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: unsupervised morphological segmentation, low-resource languages, morphological typology, reduplication
TL;DR: A new unsupervised morphological segmentation system, leveraging translation data, does well.
Abstract: This paper introduces VORM, an unsupervised morphological segmentation system, leveraging translation data to infer highly accurate morphological transformations, including less-frequently modeled processes such as infixation and reduplication. The system is evaluated on standard benchmark data, as well as on a novel dataset of 37 typologically diverse languages. In both cases, its results compare favourably to other unsupervised systems.
Submission Number: 215
Loading