Chinese Lexical Normalization Based on Information Extraction: An Experimental Study

Tian Tian, Weiran Xu

Published: 2017, Last Modified: 01 Oct 2024ICANN (2) 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this work, we described a novel method for normalizing Chinese informal words to their standard equivalents. We form the task as an information extraction problem, using Q & A community answers as source corpus. We proposed several LSTM based models for the extraction task. To evaluate and compare performances of the proposed models, we developed a standard dataset containing factoid generated by real-world users in daily life. Since our method do not use any linguistic features, it’s also applicable to other languages.