Keywords: social bias, stereotypes, large language models, attribution theory
Abstract: When a student fails an exam, do we tend to blame their effort or the test’s difficulty? Attribution, defined as how reasons are assigned to event outcomes, shapes perceptions, reinforces stereotypes, and influences decisions. Attribution Theory explains how people attribute causes to internal factors (effort, ability) or external ones (task difficulty, luck). LLMs' attribution of event outcomes based on demographics carries important fairness implications. Most works exploring social biases in LLMs focus on surface-level associations or isolated stereotypes. This work proposes a cognitively grounded bias evaluation framework to identify how models’ reasoning disparities shape demographic bias across three contexts: single-actor, actor–actor, and actor–observer, capturing comparative and perspective-driven biases overlooked in prior work. Introducing a 140k-prompt benchmark covering ten scenarios and four social dimensions, our analyses reveal attribution asymmetries across identities that vary in multi-actor and observer settings, suggesting that other identities influence bias. This work underscores the need for cognitively grounded bias evaluation and informs future debiasing efforts through the proposed framework.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 5992
Loading