Abstract: A Chinese Input Method Engine (IME) helps user convert a keystroke sequence into the desired Chinese character sequence. It is usually a cascaded process in which the original input sequence is firstly corrected to remove typos, then segmented into the pinyin token sequence, and finally converted into a Chinese character sequence. Errors are prone to accumulate and propagate in that pipeline. This paper summarizes that process as a Key-to-Character (K2C) conversion task and solve it in a unified end-to-end way. We propose PIANO (Pinyin bIdirectional non-Auto-regressive nOise-robust Transformers) to solve the error propagation problem effectively and improve the IME engine performance significantly in experiments. Moreover, we model the user real input behaviors and design a method to generate the massive training corpus with typos for the K2C task. It further improves the robustness of PIANO. Finally, we design a non-autoregressive (NAR) decoder for PIANO and obtain 9x+ acceleration with limited performance degradation, which makes it possible to deploy on the commercial input software.
Paper Type: long
0 Replies
Loading