Towards Situated Bias Evaluations in LLM Alignment

Towards Situated Bias Evaluations in LLM Alignment

ACL ARR 2024 June Submission2038 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The global adoption of chat-based large language models (LLMs) necessitates ensuring their inclusivity across diverse sociocultural contexts. Despite efforts to align these models with human preferences, it remains uncertain whether such alignment may amplify pre-existing social biases. Current bias evaluation frameworks are limited to narrow, hegemonic social contexts, such as binary gender biases in occupational associations, overlooking the diverse range of harms affecting marginalized communities. In this paper, we investigate aligned LLMs for biases across underrepresented evaluation dimensions such as gender-diverse representation and multilingual accessibility. Through a comprehensive evaluation of 12 models, we uncover several key findings: (1) gender-diverse disparities persist after alignment and can be measured both in extrinsic model output and intrinsic reward analysis (2) aligned models reflect linguistic norms which favor higher-resourced languages, potentially disadvantaging lower-resource languages. Our findings highlight the need for more comprehensive bias evaluation frameworks formed in dialogue with diverse sociocultural contexts.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Ethics, Bias, and Fairness, Human-Centered NLP

Contribution Types: Approaches to low-resource settings, Data analysis, Position papers

Languages Studied: English analysis + multilingual evaluations (English, French, German, Spanish, Dutch, Hungarian, Italian, Hungarian)

Submission Number: 2038

Loading