Part-of-Speech Induction for VietnameseOpen Website

2013 (modified: 05 Nov 2021)KSE (2) 2013Readers: Everyone
Abstract: This paper presents a method for automatically inducing the parts-ofspeech of the Vietnamese language from a large text corpus. We first build a classbased bigram language model using several statistical algorithms assigning words to classes based on their ability to combine with neighbouring words.We then show that this model is able to extract word classes that have the flavor of either syntactically based or semantically based groupings of Vietnamese words, which are the long disputed approaches among the Vietnamese linguistic community. Finally, the quality of word clusters is quantitatively evaluated when word cluster features are used to improve the accuracy of a statistical part-of-speech tagger for Vietnamese.
0 Replies

Loading