Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

Charles Jin

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

Charles Jin

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Data, Evaluation, Science of LMs, Human mind, brain, philosophy, laws and LMs, LMs and the world

Keywords: probing, causality, interpretability, understanding, world models

TL;DR: We propose a framework for probing based on structural causal models, which analyses the extent to which LMs are grounded over the latent variables of the data generation process.

Abstract: As language models (LMs) deliver increasing performance on a range of NLP tasks, *probing classifiers* have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processes the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using *structural causal models* (SCM). Specifically, given an SCM which explains the distribution of tokens observed during training, we frame the central hypothesis as whether the LM has learned to represent the latent variables of the SCM. Empirically, we extend a recent study of LMs in the context of a synthetic grid-world navigation task, where having an exact model of the underlying causal structure allows us to draw strong inferences from the result of probing experiments. Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 1349

Loading