Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps
Keywords: AI Safety, Multilingual, Benchmark, Machine Translation, Low-Resource, Synthetic Data Generation
Abstract: Large language models (LLMs) often fail to maintain safety in low-resource language varieties, such as code-mixed vernaculars and regional dialects. We introduce RABAKBENCH, a multilingual safety benchmark and scalable pipeline localized to Singapore’s unique linguistic landscape, covering Singlish, Chinese, Malay, and Tamil. We construct the benchmark through a novel three-stage pipeline: (1) Generate: augmenting real-world unsafe web content via LLM-driven red teaming; (2) Label: applying semi-automated multi-label annotation using majority-voted LLM labelers; and (3) Translate: performing high-fidelity, toxicity-preserving translation. The resulting dataset contains over 5,000 examples across six fine-grained safety categories. Despite using LLMs for scalability, our framework maintains rigorous human oversight, achieving 0.70–0.80 inter-annotator agreement. Evaluations of 13 state-of-the-art guardrails reveal significant performance degradation, underscoring the need for localized evaluation. RABAKBENCH provides a reproducible framework for building safety benchmarks in underserved communities.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Efficient/Low-Resource Methods for NLP, Machine Translation, Multilingualism and Cross-Lingual NLP, Resources and Evaluation, Language Modeling
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: English, Chinese, Malay, Tamil
Submission Number: 1685
Loading