Track: Extended abstract
Keywords: interpretability, pruning, compositional generalization
Abstract: Transformer language models have shown improvements on compositional generalization benchmarks, but we lack understanding of how these models actually implement compositional generalization. In this work, we propose a method to identify the $\textit{compositional core}$--the key subnetwork that models use to generalize compositionally. We compare this compositional core against subnetworks from models that simply memorize tasks or rely on shallow distributional patterns. Our analysis reveals that the attention mechanisms in compositionally generalizing subnetworks behave distinctively, with a notable focus on the end-of-sequence (EOS) token. This finding suggests that language models may be using special tokens like EOS as registers to hold and manipulate sentence representations.
Submission Number: 38
Loading