New word detection algorithm for Chinese based on extraction of local context informationDownload PDFOpen Website

2008 (modified: 15 Nov 2021)ISKE 2008Readers: Everyone
Abstract: Chinese segmentation is an important issue in Chinese text processing. The traditional segmentation methods those depend on an existing dictionary suffer the drawbacks when encounter unknown words. The paper proposed a segmenting algorithm for Chinese based on extracting local context information. It added the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focusing on the process of online segmentation and new word detection achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.
0 Replies

Loading