UbuntuGuard: A Policy-Based Safety Benchmark for Low-Resource African Languages

Published: 14 Dec 2025, Last Modified: 11 Jan 2026LM4UC@AAAI2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safety, Evaluation, Benchmarks, Guardrails, Multilingual Systems, Safety Systems, Low Resource Languages
TL;DR: We introduce UbuntuGuard, the first African policy-based safety benchmark designed to evaluate the robustness of guardian models in culturally and linguistically diverse settings
Abstract: Guardian models monitor and regulate the outputs of user-facing AI systems. However, current guardian models fall short in two key ways. First, they are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual safety failures, and cultural misalignment. Second, most guardian models rely on rigid, predefined safety categories that do not generalize across diverse linguistic and sociocultural contexts. Ensuring robust safety requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare, education, government, and finance. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate six state-of-the-art guardian models, including static, dynamic, and multilingual variants, under multiple scenarios. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle in fully localized African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages.
Submission Number: 24
Loading