\textsc{Vorm}: Translations and a constrained hypothesis space support unsupervised morphological segmentation across languages
Keywords: unsupervised morphological segmentation, low-resource languages, morphological typology, reduplication
TL;DR: A new unsupervised morphological segmentation system, leveraging translation data, does well.
Abstract: This paper introduces VORM, an unsupervised morphological segmentation system, leveraging translation data to infer highly accurate
morphological transformations, including less-frequently modeled processes such as infixation and reduplication. The system is evaluated on standard benchmark data, as well as on a novel dataset of 37 typologically diverse languages. In both cases, its results compare favourably to other unsupervised systems.
Submission Number: 215
Loading