Mimicking Footprints or Genuine Understanding? Exploring Cultural Representation Bias in Large Language Models

Mimicking Footprints or Genuine Understanding? Exploring Cultural Representation Bias in Large Language Models

ACL ARR 2025 February Submission8166 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This study investigates Large Language Models' (LLMs) capacity for cross-cultural understanding in moral reasoning tasks, examining whether their performance reflects genuine comprehension or sophisticated pattern matching. Given documented biases in LLMs' training data toward English-language content and Western perspectives, we evaluated five widely-deployed models (Gemini, GPT-4, Llama 3 8B, Llama 3.1 8B, and Mistral 7B) using three datasets reflecting variations along cultural dimensions: the World Values Survey, the Moral Machine experiment, and the COVID-19 Vaccine Hesitancy survey. Our analysis revealed three key findings: (1) While human responses demonstrated clear cultural clustering patterns, particularly in the WVS and Vaccine datasets, LLMs failed to replicate these distinct cultural groupings, suggesting limitations in capturing underlying cultural dynamics. (2) Cultural representation bias varied significantly by model architecture ($F=47.70\text{-}416.88$, $p<.001$) and cultural context ($F=4.34\text{-}13.09$, $p<.001$), with GPT-4 showing consistent performance (22\%-31\%) while Llama 3 achieved lowest bias in WVS (17\%). (3) Demographic-cultural interactions varied unexpectedly across datasets and models, notably in Orthodox Europe where top-performing Llama 3 showed increased bias while other models improved. These findings suggest that while LLMs can effectively pattern-match in simple moral reasoning tasks, they face substantial challenges in processing complex cross-cultural moral scenarios, indicating limitations in their genuine understanding of cultural nuances.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation, ethical considerations in NLP applications, cultural bias

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8166

Loading