Enhancing Chinese Offensive Language Detection with Homophonic Perturbation

ACL ARR 2025 May Submission3533 Authors

19 May 2025 (modified: 08 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Detecting offensive language in Chinese is challenging due to homophonic substitutions used to evade detection. We propose a framework to improve large language models’ robustness against such phonetic attacks. First, we construct HED-COLD, a homophone-enhanced dataset based on the Chinese Offensive Language Dataset. Additionally, we propose a homophone-aware pretraining strategy that aligns semantics and fuses features to learn robust mappings between original and perturbed text. Experimental results show that our approach achieves state-of-the-art performance on both the COLD test set and the toxicity benchmark ToxiCloakCN. Notably, it achieves greater gains in domains especially prone to homophonic attacks, such as gender and regional content. These results demonstrate improved robustness and generalization against phonetic adversarial attacks.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Computational Social Science and Cultural Analytics, Dialogue and Interactive Systems, Multilingualism and Cross-Lingual NLP
Languages Studied: English, Chinese
Submission Number: 3533
Loading