CG-CSC: A Counterfactual Generation Method for Improving Chinese Spelling Correction

ACL ARR 2025 May Submission6346 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chinese Spelling Correction (CSC) aims to detect and correct misspelled characters in Chinese text, a prerequisite for reliable downstream Natural Language Processing (NLP) applications. Although existing methods have achieved promising performance, they still suffer from spurious correlations caused by long-tailed data distributions, leading to over-correction of head-frequency mappings and under-correction of rare or unseen mappings. To address this, we propose Counterfactual Generation for Chinese Spelling Correction (CG-CSC), a causally grounded framework that synthesizes counterfactual pairs to balance the training data distribution. Experimental results on three widely used SIGHAN benchmarks show that our method significantly improves correction performance, particularly on rare and out-of-training cases, demonstrating enhanced robustness and generalization.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: grammatical error correction
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: Chinese
Keywords: Chinese Spelling Correction, Counterfactual Generation, Long-Tailed Distribution
Submission Number: 6346
Loading