Is It Bad to Work All the Time? Cross-Cultural Evaluation of Social Norm Biases in GPT-4

Is It Bad to Work All the Time? Cross-Cultural Evaluation of Social Norm Biases in GPT-4

ACL ARR 2025 May Submission1274 Authors

17 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: LLMs have been demonstrated to align with the values of Western or North American cultures. Prior work predominantly showed this effect through leveraging surveys that directly ask – originally people and now also LLMs – about their values. However, it is hard to believe that LLMs would consistently apply those values in real-world scenarios. To address that, we take a bottom-up approach, asking LLMs to reason about cultural norms in narratives from different cultures. We find that GPT-4 tends to generate norms that, while not necessarily incorrect, are significantly less culture-specific. In addition, while it avoids overtly generating stereotypes, the stereotypical representations of certain cultures are merely hidden rather than suppressed in the model, and such stereotypes can be easily recovered. Addressing these challenges is a crucial step towards developing LLMs that fairly serve their diverse user base.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: culture, social norms, bias, stereotypes

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1274

Loading