Abstract: This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Opti- mality Theory (OT; Prince and Smolensky (2004)), a standard phonological framework. The features in our model are inspired by OT's Markedness and Faithfulness constraints. Fol- lowing the OT principle that such features in- dicate "violations", we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/ and /t/ segments. The model sets a new state-of- the-art for this corpus for word segmentation, identification of underlying forms, and identi- fication of /d/ and /t/ deletions. We also show that the OT-inspired sign constraints on fea- ture weights are crucial for accurate identifi- cation of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.
0 Replies
Loading