Keywords: bias, evaluation, perplexity, fairness, negation
TL;DR: This study investigates whether LLMs acquire subtle linguistic biases, such as negation bias, by combining social science theories with NLP research and evaluating eight models using a novel challenge set and perplexity as a metric
Abstract: Large Language Models trained on large-scale uncontrolled corpora often encode stereotypes and biases, which can be displayed through harmful text generation or biased associations. However, do they also pick up subtler linguistic patterns that can potentially reinforce and communicate biases and stereotypes, as humans do? We aim to bridge theoretical insights from social science with bias research in NLP by designing controlled, theoretically motivated LLM experiments to elicit this type of bias. Our case study is negation bias, the bias that humans have towards using negation to describe situations that challenge common stereotypes. We construct an evaluation dataset containing negated and affirmed stereotypical and anti-stereotypical sentences and evaluate the performance of eight language models using perplexity as a metric for measuring model surprisal. We find that the autoregressive decoder models in our experiment exhibit this bias, while we do not find evidence for it among the stacked encoder models.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1698
Loading