“You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language Models
Abstract: {Warning: This paper contains examples that may be offensive.}}\\
While a few high-quality bias benchmark datasets exist to address stereotypes in Language Models (LMs), a notable lack of focus remains on body image stereotypes. To bridge this gap, we propose \suite{}, a suite to uncover LMs' biases towards people of certain physical appearance characteristics, namely, \textit{skin complexion, body shape, height, attire,} and a \textit{miscellaneous category} including \textit{hair texture, eye color, and more}. Our dataset comprises 40k sentence pairs designed to assess LMs' biased preference for certain body types. We further include 60k premise-hypothesis pairs designed to comprehensively assess LMs' preference for fair skin tone. Additionally, we curate 553 tuples consisting of a \textit{body image descriptor, gender, and a stereotypical attribute}, validated by a diverse pool of annotators for physical appearance stereotypes. We propose a metric, \metric{}, that captures the biased preferences of LMs towards a certain body type over others. Using \suite{}, we assess the presence of body image biases in ten different language models, revealing significant biases in models Muril, XLMR, Llama3, and Gemma. We further evaluate the LMs through downstream NLI and Analogy tasks. Our NLI experiments highlight notable patterns in the LMs that align with the well-documented cognitive bias in humans known as \textbf{\textit{the Halo Effect}}
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Bias Benchmark dataset, Body Image Stereotypes, PLL Score
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 8468
Loading