Keywords: LLM, Bias and Fairness, Fairness Auditing, Bias Measurement
Abstract: Large Language Models (LLMs) reproduce social biases, yet prevailing evaluations
score models in isolation, obscuring how biases persist across families and
releases. We introduce Bias Similarity Measurement (BSM), which treats fairness
as a relational property between models, unifying scalar, distributional, behavioral,
and representational signals into a single similarity space. Evaluating 30
LLMs on 1M+ prompts, we find that instruction tuning primarily enforces abstention
rather than altering internal representations; small models gain little accuracy
and can become less fair under forced choice; and open-weight models can match
or exceed proprietary systems. Family signatures diverge: Gemma favors refusal,
LLaMA 3.1 approaches neutrality with fewer refusals, and converges toward
abstention-heavy behavior overall. Counterintuitively, Gemma 3 Instruct matches
GPT-4–level fairness at far lower cost, whereas Gemini’s heavy abstention suppresses
utility. Beyond these findings, BSM offers an auditing workflow for procurement,
regression testing, and lineage screening, and extends naturally to code
and multilingual settings. Our results reframe fairness not as isolated scores but
as comparative bias similarity, enabling systematic auditing of LLM ecosystems.
Code is available at https://anonymous.4open.science/r/bias_llm-0A8E.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 4567
Loading