Pinyin-bert: A new solution to Chinese pinyin to character conversion taskDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language, and so on. The dominant technique is Ngram language model together with smoothing technique. However, Ngram model's low capacity limits its performance. Under the trend of deep learning, this paper choose the powerful bert network architecture and propose Pinyin-bert to solve the P2C task, which achieves substantial performance improvement from Ngram model. Furthermore, we combine Pinyin-bert with Ngram model under Markov model's framework and improve performance further. Lastly, we design a way to incorporate external lexicon into Pinyin-bert so as to adapt to the out of domain.
0 Replies

Loading