Track: Long Paper Track (up to 9 pages)
Keywords: AI Safety, Benchmark, Large Language Models, Multilingual
Abstract: Building safe Large Language Models (LLMs) across multiple languages is essen-
tial in ensuring both safe access and linguistic diversity. To this end, we introduce
M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five lan-
guages: English, French, German, Italian, and Spanish. M-ALERT includes 15k
high-quality prompts per language, totaling 75k, following the detailed ALERT
taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the
importance of language-specific safety analysis, revealing that models often exhibit
significant inconsistencies in safety across languages and categories. For instance,
Llama3.2 shows high unsafety in category crime_tax for Italian but remains safe
in other languages. Similar differences can be observed across all models. In con-
trast, certain categories, such as substance_cannabis and crime_propaganda,
consistently trigger unsafe responses across models and languages. These findings
underscore the need for robust multilingual safety practices in LLMs to ensure
responsible usage across diverse communities
Submission Number: 41
Loading