Max Local Entropy Error Generation for Semantic Spelling Correction in ChineseDownload PDF


17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone
Abstract: Chinese spelling correction (CSC) is a task to detect and correct spelling errors in Chinese texts. Some Chinese spelling errors are semantic errors, which can not be corrected only depending on syntax rules and local context. Global semantic information is needed to correct these errors. BERT-based models have proven to be an effective way to do CSC task. However, due to a lack of semantic errors in existing datasets, the BERT’s ability to capture global semantic information is weakened. This causes the models’ vulnerability to real-world examples. To address this, we propose a method referred to as MLEEG (Max Local Entropy Error Generation) to generate adversarial examples containing semantic errors. Experiment results show that BERT-based CSC models are vulnerable to adversarial examples generated by MLEEG, and adding MLEEG adversarial examples can improve the robustness of BERT-based CSC models without decreasing their performances on existing datasets.
Paper Type: short
Research Area: NLP Applications
0 Replies
