Pinyin-BART: An End-to-End Chinese Input MethodDownload PDF

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone
Paper Link: https://openreview.net/forum?id=fQlt1e4ZD_8
Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)
Abstract: A Chinese Input Method Engine helps user convert a keystroke sequence into the desired Chinese character sequence. It is usually a cascaded process in which the original input sequence is firstly corrected to remove typos, then segmented into the pinyin token sequence, and finally converted into a Chinese character sequence. Errors are prone to accumulate and propagate in that pipeline. This paper summarizes that process as a Key-to-Character (K2C) conversion task and solve it in a unified end-to-end way. Pinyin-bart is proposed which can effectively solve the error propagation problem and improve the IME engine performance significantly in experiments. Moreover, we model the user real input behaviors and design a method to generate the training corpus with typos for the K2C task. It further improves the robustness of Pinyin-bart. Finally, we design a non-autoregressive (NAR) decoder for Pinyin-bart and obtain 9x+ acceleration with limited performance degradation, which makes the deployment possible on the commercial input software.
0 Replies

Loading