Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Sophie Hao; Tal Linzen

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Sophie Hao, Tal Linzen

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Interpretability, Interactivity, and Analysis of Models for NLP

Submission Track 2: Language Modeling and Analysis of Language Models

Keywords: interpretability, analysis, representations, hidden vectors, syntax, subject-verb agreement, transformers, pre-trained models, language models, bert, causal analysis, causality, causal intervention, inlp

TL;DR: We show through causal intervention that Transformer language models conjugate verbs using an interpretable linear representation of subject number in hidden vectors.

Abstract: Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.

Submission Number: 1084

Loading