Revisiting the Knowledge Recall and Selection in Chinese Spelling Correction

ACL ARR 2024 June Submission5036 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chinese Spelling Correction (CSC) task is very challenging in the natural language processing area. However, the performance improvement is quite limited, primarily because the infusion of knowledge is limited. Previous work involved confusion sets as additional knowledge, but the size was too small and served only as a role of additional feature. To address this, we propose a knowledge recall and selection network (ReSC). First through four recall methods to achieve an average recall rate above 93\%, with individual character recall of around 150 related characters/words. Subsequently, we proposed a Knowledge Selection Algorithm, choosing the appropriate characters or words from numerous recall sets. The knowledge selection network is highly efficient, as the F1 score nearly reached 100\%. Extensive experiments have proven ReSC is able to inject substantial amount of entities with even a lower False Positive Rate. This novel network acheves the new SOTA results across three domain-specific datasets.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Chinese Spelling Correction; Knowledge Representation; Knowledge Injection;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: Chinese
Submission Number: 5036