An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-TrainingDownload PDF

2013 (modified: 04 Sept 2019)EMNLP 2013Readers: Everyone
Abstract: In this paper we report an empirical study on semi-supervised Chinese word segmentation using co-training. We utilize two segmenters: 1) a word-based segmenter leveraging a word-level language model, and 2) a character-based segmenter using characterlevel features within a CRF-based sequence labeler. These two segmenters are initially trained with a small amount of segmented data, and then iteratively improve each other using the large amount of unlabelled data. Our experimental results show that co-training captures 20% and 31% of the performance improvement achieved by supervised training with an order of magnitude more data for the SIGHAN Bakeoff 2005 PKU and CU corpora respectively.
0 Replies

Loading