Bias Similarity Across Large Language Models

Bias Similarity Across Large Language Models

ACL ARR 2025 May Submission1243 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Bias in Large Language Models remains a critical concern as these systems are increasingly deployed in high-stakes applications. Yet most fairness evaluations rely on scalar metrics or single-model analysis, overlooking how biases align---or diverge---across model families, scales, and tuning strategies. In this work, we reframe bias similarity as a form of functional similarity and evaluate 24 LLMs from four major families on over one million structured prompts spanning four bias dimensions. Our findings uncover that fairness is not strongly determined by model size, architecture, instruction tuning, or openness. Instead, bias behaviors are highly context-dependent and structurally persistent, often resistant to current alignment techniques. Contrary to common assumptions, we find that open-source models frequently match or outperform proprietary models in both fairness and utility. These results call into question the default reliance on proprietary systems and highlight the need for behaviorally grounded, model-specific audits to better understand how bias manifests and endures across the LLM landscape.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: data ethics; model bias/fairness evaluation; model bias/unfairness mitigation; ethical considerations in NLP applications; transparency; policy and governance; reflections and critiques;

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 1243

Loading