PATCH Dataset: Empowering Traditional Chinese Safety Classifiers for Lightweight LLM

PATCH Dataset: Empowering Traditional Chinese Safety Classifiers for Lightweight LLM

ACL ARR 2025 May Submission215 Authors

09 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Lightweight Large Language Models (LLMs) have demonstrated notable safety alignment deficits, particularly outside of English. These challenges are especially acute for Traditional Chinese (TC), stemming from distinct linguistic characteristics and a scarcity of dedicated safety resources. To address this, we introduce the Prompt Assortment for Traditional Chinese Hazards (PATCH) dataset, the first large-scale adversarial dataset tailored for TC safety evaluation, aligned with standard threat taxonomies. Using PATCH, we evaluated Llama Guard, RoBERTa, and Longformer architectures with full fine-tuning, Low-Rank Adaptation (LoRA), and Chat-Vector methods. Our findings demonstrate that parameter-efficient LoRA achieves classification performance (F1 > 0.99) comparable to full fine-tuning, providing an effective and efficient method for developing TC safety classifiers. We also find initial evidence suggesting targeted LoRA tuning may offer cross-lingual safety benefits.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, datasets for low resource languages, evaluation methodology, benchmarking

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: Traditional Chinese, English

Submission Number: 215

Loading