Selection Bias Induced Spurious Correlations in Large Language Models

Emily McMilin

Selection Bias Induced Spurious Correlations in Large Language Models

Emily McMilin

Published: 21 Jul 2022, Last Modified: 04 May 2025SCIS 2022 PosterReaders: Everyone

Keywords: spurious correlations, large language models, causal inference

TL;DR: Dataset selection bias can cause two unconditionally independent variables to become dependent, thus inducing spurious correlations in real-world inference domains.

Abstract: In this work we show how large language models (LLMs) can learn statistical dependencies between otherwise unconditionally independent variables due to dataset selection bias. To demonstrate the effect, we developed a masked gender task that can be applied to BERT-family models to reveal spurious correlations between predicted gender pronouns and a variety of seemingly gender-neutral variables like date and location, on pre-trained (unmodified) BERT and RoBERTa large models. Finally, we provide an online demo, inviting readers to experiment further.

Confirmation: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/selection-bias-induced-spurious-correlations/code)

0 Replies

Loading