Chinese Unknown Word Identification Using Character-based Tagging and ChunkingDownload PDFOpen Website

2003 (modified: 12 Nov 2022)ACL (Companion) 2003Readers: Everyone
Abstract: Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.
0 Replies

Loading