PROTECT: Parameter-Efficient Tuning for Few-Shot Robust Chinese Text Correction

Published: 01 Jan 2024, Last Modified: 19 May 2025IEEE ACM Trans. Audio Speech Lang. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Non-normative texts and euphemisms are widely spread on the Internet, making it more difficult to moderate the content. These phenomena result from misspelling errors or deliberate textual attacks by users. However, current methods lack a robust design and are vulnerable to adversarial attacks. Thus, exploring robust text correction is of great significance. Text correction aims to automatically detect and correct errors in sentences, which can be employed as a defensive method against adversarial attacks. In this work, we provide PROTECT: a robust Chinese language model for more general Chinese text correction. In particular, we propose a simple but effective self-supervised learning approach for calibrating generalizable representations, which is robust to adapt to the changing adversarial texts. S pecifically, we develop an adversarial-aware multi-feature representation method that establishes auxiliary supervision. To the best of our knowledge, the PROTECT is the first model to represent perfect pinyin, abbreviation pinyin, character split, visual, and phonetic features simultaneously. Based on the generated adversarial examples, PROTECT is trained from scratch through a unified text-to-text generation paradigm. This empowers the model to correct multi-type text errors and inconsistent lengths simultaneously. After obtaining adversarially robust representations, we design a novel parameter-efficient tuning method consisting of context-specific adaptive prefix and semantic-consistent low-rank adaptation modules to implement zero-shot and few-shot learning. Extensive experimental results have shown that by tuning the parameters by 0.2%, PROTECT achieves best performance in the full-data and in the low-resource setting.
Loading