Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

Lost in Localization: Building RabakBench with Human-in-the-Loop Validation to Measure Multilingual Safety Gaps

ACL ARR 2026 January Submission1685 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Safety, Multilingual, Benchmark, Machine Translation, Low-Resource, Synthetic Data Generation

Abstract: Large language models (LLMs) often fail to maintain safety in low-resource language varieties, such as code-mixed vernaculars and regional dialects. We introduce RABAKBENCH, a multilingual safety benchmark and scalable pipeline localized to Singapore’s unique linguistic landscape, covering Singlish, Chinese, Malay, and Tamil. We construct the benchmark through a novel three-stage pipeline: (1) Generate: augmenting real-world unsafe web content via LLM-driven red teaming; (2) Label: applying semi-automated multi-label annotation using majority-voted LLM labelers; and (3) Translate: performing high-fidelity, toxicity-preserving translation. The resulting dataset contains over 5,000 examples across six fine-grained safety categories. Despite using LLMs for scalability, our framework maintains rigorous human oversight, achieving 0.70–0.80 inter-annotator agreement. Evaluations of 13 state-of-the-art guardrails reveal significant performance degradation, underscoring the need for localized evaluation. RABAKBENCH provides a reproducible framework for building safety benchmarks in underserved communities.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Efficient/Low-Resource Methods for NLP, Machine Translation, Multilingualism and Cross-Lingual NLP, Resources and Evaluation, Language Modeling

Contribution Types: Approaches to low-resource settings, Data resources

Languages Studied: English, Chinese, Malay, Tamil

Submission Number: 1685

Loading