Driving Chinese Spelling Correction from a Fine-Grained Perspective

ACL ARR 2024 April Submission613 Authors

16 Apr 2024 (modified: 19 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper explores the task: Chinese spelling correction (CSC), from a fine-grained perspective by recognizing that existing evaluations lack nuanced typology for the spelling errors. This deficiency can create a misleading impression of models' performance, incurring an ``invisible'' bottleneck hindering the advancement of CSC research. In this paper, we categorize spelling errors into six types and conduct a fine-grained evaluation across a wide variety of models, including tagging models, ReLM, and LLMs. As a result, we pinpoint the underlying weaknesses of existing state-of-the-art models - utilizing contextual clues and handling co-existence of multiple typos. However, these two types of errors suffer from very low occurrence in conventional training corpus. Therefore, we introduce new error generation methods to artificially augment their occurrence. Armed with augmented data, we eventually enhance the overall performance of prior CSC models by boosting their performance on specific errors. We hope that this work could provide fresh insight for future CSC research.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Chinese spelling correction
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: Chinese
Submission Number: 613
Loading