Chinese Sentence Tokenization Using Viterbi Decoder

Haizhou Li, Zhiwei Lin, Shuanhu Bai

Published: 1998, Last Modified: 13 Nov 2025ISCSLP 1998EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, an approach to Chinese sentence tokenization is proposed whereby word segmentation and text normalization could be conducted at the same time within the framework of Viterbi decoding. In the process, not only lexical words but also the new word classes could be identified. The approach demonstrated is very practical in sentence tokenization for n-gram statistical language modeling.

External IDs:dblp:conf/iscslp/0001LB98