Quantifying Social Biases Using Templates is Unreliable

Anonymous

Quantifying Social Biases Using Templates is Unreliable

Anonymous

17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone

Abstract: While large language models (LLMs) have enabled rapid advancements in NLP, they also propagate and amplify biases that negatively impact marginalized groups. To perform bias evaluation, previous works have utilized templates, which allow researchers to quantify model bias in the absence of appropriate bias benchmarks. Although template evaluation is a convenient diagnostic tool to understand model deficiencies, it often uses a limited and simplistic set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking by manually modifying templates proposed in previous works in a meaning-preserving manner and measuring corresponding bias on four tasks. We find that bias values and resulting conclusions vary considerably across template modifications, ranging from 20% (NLI) to 250% (MLM) original task-specific measures. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution. We will make our code and datasets publicly available upon acceptance.

Paper Type: short

Research Area: Ethics, Bias, and Fairness

0 Replies

Loading