“You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language Models

“You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language Models

ACL ARR 2025 February Submission8468 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: {Warning: This paper contains examples that may be offensive.}}\\ While a few high-quality bias benchmark datasets exist to address stereotypes in Language Models (LMs), a notable lack of focus remains on body image stereotypes. To bridge this gap, we propose \suite{}, a suite to uncover LMs' biases towards people of certain physical appearance characteristics, namely, \textit{skin complexion, body shape, height, attire,} and a \textit{miscellaneous category} including \textit{hair texture, eye color, and more}. Our dataset comprises 40k sentence pairs designed to assess LMs' biased preference for certain body types. We further include 60k premise-hypothesis pairs designed to comprehensively assess LMs' preference for fair skin tone. Additionally, we curate 553 tuples consisting of a \textit{body image descriptor, gender, and a stereotypical attribute}, validated by a diverse pool of annotators for physical appearance stereotypes. We propose a metric, \metric{}, that captures the biased preferences of LMs towards a certain body type over others. Using \suite{}, we assess the presence of body image biases in ten different language models, revealing significant biases in models Muril, XLMR, Llama3, and Gemma. We further evaluate the LMs through downstream NLI and Analogy tasks. Our NLI experiments highlight notable patterns in the LMs that align with the well-documented cognitive bias in humans known as \textbf{\textit{the Halo Effect}}

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Bias Benchmark dataset, Body Image Stereotypes, PLL Score

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 8468

Loading