Compositional Cores: Persistent Attention Patterns in Compositionally Generalizing Subnetworks

Michael Y. Hu; Chuan Shi; Tal Linzen

Compositional Cores: Persistent Attention Patterns in Compositionally Generalizing Subnetworks

Michael Y. Hu, Chuan Shi, Tal Linzen

Published: 21 Sept 2024, Last Modified: 06 Oct 2024BlackboxNLP 2024EveryoneRevisionsBibTeXCC BY 4.0

Track: Extended abstract

Keywords: interpretability, pruning, compositional generalization

Abstract: Transformer language models have shown improvements on compositional generalization benchmarks, but we lack understanding of how these models actually implement compositional generalization. In this work, we propose a method to identify the $\textit{compositional core}$--the key subnetwork that models use to generalize compositionally. We compare this compositional core against subnetworks from models that simply memorize tasks or rely on shallow distributional patterns. Our analysis reveals that the attention mechanisms in compositionally generalizing subnetworks behave distinctively, with a notable focus on the end-of-sequence (EOS) token. This finding suggests that language models may be using special tokens like EOS as registers to hold and manipulate sentence representations.

Submission Number: 38

Loading