I Am Aligned, But With Whom? Diagnosing Structural Alignment Failures in Multilingual LLMs

I Am Aligned, But With Whom? Diagnosing Structural Alignment Failures in Multilingual LLMs

ACL ARR 2026 January Submission7255 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Cultural alignment, Multilingual LLMs, Cross-cultural NLP

Abstract: Current alignment strategies increasingly rely on reasoning-based evaluations and safety fine-tuning to improve robustness and mitigate bias. We challenge the efficacy of these paradigms in cross-cultural contexts through a large-scale diagnostic study of Large Language Models. Using over 820,000 data points derived from authoritative surveys across the Middle East and North Africa (MENA), we probe the internal representations and reasoning dynamics of seven diverse models. Our analysis uncovers three systematic failures. First, we identify reasoning-induced degradation: prompting models to explain their reasoning is associated with decreased cultural alignment scores. Second, we reveal logit leakage: models exhibit performative safety by refusing sensitive questions in generated text while simultaneously assigning high probability mass (>75%) to biased answers in their internal distributions. Third, we demonstrate linguistic determinism: internal representations collapse diverse nations into simplistic clusters based solely on language family, overriding actual cultural heterogeneity. These findings suggest that current multilingual alignment is superficial, relying on linguistic proxies rather than genuine cultural understanding. We release the MENAValues diagnostic suite to facilitate further research into the interpretability and faithfulness of cross-cultural alignment.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation, language/cultural bias analysis, ethical considerations in NLP applications, transparency

Contribution Types: Model analysis & interpretability

Languages Studied: English, Persian, Turkish, Arabic

Submission Number: 7255

Loading