Selection Collider Bias in Large Language Models

Emily McMilin

Selection Collider Bias in Large Language Models

Emily McMilin

Published: 09 Jul 2022, Last Modified: 20 Apr 2025CRL@UAI 2022 PosterReaders: Everyone

Keywords: large language models, causal inference, selection bias

TL;DR: Large language models (LLMs) can learn statistical dependencies between otherwise unconditionally independent variables.

Abstract: In this paper we motivate the causal mechanisms behind sample selection collider bias in Large Language Models (LLMs). We show that selection collider bias can be amplified in underspecified learning tasks, and that the magnitude of the resulting spurious correlations appear scale agnostic. While selection collider bias can be pervasive and difficult to overcome, we describe a method to exploit the resulting spurious associations for measurement of when a model may be uncertain about its prediction, and demonstrate it on an extended version of the Winogender Schemas evaluation set.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/selection-collider-bias-in-large-language/code)

3 Replies

Loading