SEA-SafeguardBench: A Culturally Grounded Safety Benchmark for Southeast Asian Languages

SEA-SafeguardBench: A Culturally Grounded Safety Benchmark for Southeast Asian Languages

ACL ARR 2026 January Submission20 Authors

21 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM safety, safeguard models, Southeast Asian languages, cultural grounding, safety benchmark, multilingual evaluation, low-resource languages

Abstract: Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing multilingual safety benchmarks often rely on machine-translated English data, which fails to capture nuances in low-resource languages. Southeast Asian (SEA) languages are underrepresented despite the region’s linguistic diversity and unique safety concerns, from culturally sensitive political speech to region-specific misinformation. Addressing these gaps requires benchmarks that are natively authored to reflect local norms and harm scenarios. We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA, covering eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation. The experimental results from our benchmark demonstrate that even state-of-the-art LLMs and guardrails are challenged by SEA cultural and harm scenarios and underperform when compared to English texts.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: safety and alignment, multilingual benchmarks, datasets for low resource languages

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Data resources

Languages Studied: Indonesia, Malaysia, Burmese, Thai, Tamil, Tagalog, Vietnamese

Submission Number: 20

Loading