CangjieToxi: A Chinese Offensive Language Detection Benchmark with Radical-Level Perturbations

ACL ARR 2025 February Submission8528 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this paper, we introduce CangjieToxi, a novel benchmark dataset designed to address the challenges of detecting covert offensive language in Chinese social media. Existing detection systems are often ineffective against evasion techniques that manipulate character structure to bypass censorship. We focus on two key perturbation methods: character splitting and character substitution. Character splitting involves breaking down offensive words into visually similar but contextually distinct components, while character substitution replaces offensive characters with visually similar but non-offensive ones, thus concealing the original intent. Our dataset incorporates these techniques to create more complex forms of toxicity that are difficult for traditional models to detect. We conduct extensive experiments with state-of-the-art models, revealing their limitations in handling these perturbations and demonstrating the need for more robust systems. This work advances the field by providing a resource to improve the detection of cloaked offensive language and contributing to the development of censorship-resistant detection methods.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, language resources, lexical semantic change
Contribution Types: Data resources
Languages Studied: Chinese
Submission Number: 8528
Loading