Selection Bias Induced Spurious Correlations in Large Language ModelsDownload PDF

Published: 21 Jul 2022, Last Modified: 22 Oct 2023SCIS 2022 PosterReaders: Everyone
Keywords: spurious correlations, large language models, causal inference
TL;DR: Dataset selection bias can cause two unconditionally independent variables to become dependent, thus inducing spurious correlations in real-world inference domains.
Abstract: In this work we show how large language models (LLMs) can learn statistical dependencies between otherwise unconditionally independent variables due to dataset selection bias. To demonstrate the effect, we developed a masked gender task that can be applied to BERT-family models to reveal spurious correlations between predicted gender pronouns and a variety of seemingly gender-neutral variables like date and location, on pre-trained (unmodified) BERT and RoBERTa large models. Finally, we provide an online demo, inviting readers to experiment further.
Confirmation: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](
0 Replies