Abstract: As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic evaluation of LLM safety across 12 Indic languages, spoken by over 1.2 billion people but underrepresented in LLM training data. Using a dataset of 6,000 culturally grounded prompts spanning caste, religion, gender, health, and politics, we assess 10 leading LLMs on translated variants of the prompt.
Our analysis reveals significant safety drift: cross-language agreement is just 12.8\%, and SAFE rate variance exceeds 17\% across languages. Some models over-refuse benign prompts in low-resource scripts, overflag politically sensitive topics, while others fail to flag unsafe generations. We quantify these failures using prompt-level entropy, category bias scores, and multilingual consistency indices.
Our findings highlight critical safety generalization gaps in multilingual LLMs and show that safety alignment does not transfer evenly across languages. We release IndicSafe, the first benchmark to enable culturally informed safety evaluation for Indic deployments, and advocate for language-aware alignment strategies grounded in regional harms.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilinguality, safety evaluation, low-resource languages, Indic languages, less-resourced languages, resources for less-resourced languages
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Hindi, Bengali, Odia, Tamil, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi, Nepali, Urdu
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Data: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3.1, Section 3.2
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Section 3.3
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 3.3
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 3.1
B6 Statistics For Data: Yes
B6 Elaboration: Section 3.3
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 4.1
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4.1
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 6
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Appendix A.1, Appendix A.5
D2 Recruitment And Payment: No
D2 Elaboration: In house annotators by company. FT employees assigned to this work
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 1212
Loading