Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and FrequencyDownload PDF

2003 (modified: 16 Jul 2019)ACL 2003Readers: Everyone
Abstract: We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.
0 Replies

Loading