Keywords: Based on the paper "CROSS-LINGUAL FAIRNESS DRIFT IN LLM MORAL REASONING, " here are the keywords and a TL;DR summary: Keywords Cross-lingual fairness LLM moral reasoning Distributional drift Subgroup robustness Behavioral disparity Fairness monitoring Moral philosophy (Deontology, Consequentialism, Virtue Ethics) Moral Uncertainty Index (MUI) Cultural Grounding and Reasoning Index (CGRI) Fairness Instability Score (FIS)
TL;DR: LLMs fail subgroup robustness in moral reasoning across languages. Consequentialist bias increases in non-English contexts, while cultural grounding collapses. We introduce the FIS metric to monitor drift in multilingual systems.
Abstract: As large language models (LLMs) are deployed across linguistically and culturally
diverse populations, their ethical reasoning must remain robust across demographic
subgroups, a prerequisite for fairness under the distributional shifts that deployed
systems encounter as user populations evolve. We present a framework for detect-
ing and quantifying cross-population behavioral disparity in LLM moral reasoning
across four languages (English, Spanish, Korean, and Mandarin), each representing
a distinct cultural subpopulation. Using a seven-pillar evaluation rubric spanning
deontological, consequentialist, and virtue-ethical reasoning alongside coherence,
context sensitivity, moral uncertainty (MUI), and cultural grounding (CGRI), we
evaluate five LLMs on 50 moral dilemmas. Our results reveal subgroup robustness
failures: consequentialist bias amplifies in non-English contexts (mean disparity
∆ = +0.11), cultural grounding collapses by up to 88% across languages, and
behavioral consistency varies by model. We introduce disparity metrics that quan-
tify behavioral instability across populations and show that current LLMs fail to
maintain equitable ethical reasoning when serving linguistically diverse subgroups.
These findings establish language as a critical axis for fairness auditing and as a
leading indicator of behavioral drift risk in deployed moral reasoning systems.
Submission Number: 115
Loading