Abstract: One of the key challenges in Chinese spelling check (CSC) is ensuring that modifications remain faithful to the original intent of the sentence. Confusion sets are commonly used to mitigate this issue; however, it is challenging to construct high-quality confusion sets and integrate them into the model. In this paper, we propose a plug-and-play DISC (Decoding Intervention with Similarity of Characters) module for CSC models to address these challenges. DISC measures phonetic and glyph similarities between characters and incorporates this similarity information in the decoding stage. This method can be easily integrated into various existing CSC models, such as ReaLiSe, SCOPE, and ReLM, without additional training costs. Experiments on three CSC benchmarks demonstrate that our proposed method significantly improves model performance, approaching and even surpassing the current state-of-the-art models.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Chinese spelling correction
Contribution Types: NLP engineering experiment
Languages Studied: Chinese
Submission Number: 5256
Loading