Abstract: Compositional generalization—the systematic combination of known components into novel structures—is fundamental to flexible human cognition, but the mechanisms in neural networks remain poorly understood in both machine learning and cognitive science. Lake & Baroni (2023) showed that a compact encoder-decoder transformer can achieve simple forms of compositional generalization in a sequence arithmetic task. In this work, we identify and mechanistically interpret the circuit responsible for compositional generalization in such a model. Using causal ablations, we isolate the circuit and further show that this understanding enables precise activation edits to steer the model’s outputs predictably. We found that the circuit leverages the disentangled representation of position and token so that functional transformations can be applied to positions in a token-independent manner. Our findings advance the understanding of how compositionality can emerge in neural networks and offer testable hypotheses for similar mechanisms in other neural architectures and compositional tasks. Code will be published after double-blind review.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Elahe_Arani1
Submission Number: 5009
Loading