Circuit Explained: How Does a Transformer Perform Compositional Generalization

TMLR Paper5009 Authors

01 Jun 2025 (modified: 06 Aug 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Compositional generalization — the systematic combination of known components into novel structures — is fundamental to flexible human cognition, yet the mechanisms that enable it in neural networks remain poorly understood in both machine learning and cognitive science. \citet{Lake2023-cp} showed that a compact encoder-decoder transformer can achieve simple forms of compositional generalization in a sequence arithmetic task. In this work, we identify and mechanistically interpret the circuit responsible for this behavior in such a model. Using causal ablations, we isolate the circuit and show that this understanding enables precise activation edits to steer the model’s outputs predictably. We find that the circuit performs function composition without encoding the specific semantics of any given function — instead, it leverages a disentangled representation of token position and identity to apply a general token remapping rule across an entire family of functions. Our findings advance the understanding of how compositionality can emerge in transformers and offer testable hypotheses for similar mechanisms in other architectures and compositional tasks. Code will be released after double-blind review.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear Editor and Reviewers, We thank the Editor for handling our submission and the reviewers for their constructive comments to help us strengthen this work. We have substantially revised our manuscripts to address the concerns. Below we summarize the major changes. ## 1. Expanded Validation on Different Architecture To address concerns about generalization beyond a single model instance, we retrained the model under new hyperparameters and an enriched dataset. Specifically: - The new architecture uses 3 encoder/decoder layers (up from 2) and reduces the number of attention heads per layer from 8 to 4. - The new dataset increases the maximum number of function arguments from 2 to 5 and extends the maximum output sequence length from 5 to 16 tokens. - Using the same circuit discovery pipeline, we confirmed that a similar circuit emerges in the new model, with all core functional heads replicated (Results L116, Discussion L313–314 and Appendix Figure 13). ## 2. Improved Circuit Explanation and Readability To address comments about readability: - We reorganized Section 3 by adding an overview diagram (Figure 4) and cross-references to help readers track each attention head’s role within the full circuit. - We added detailed explanations distinguishing the keep-only-one-head and ablate-only-one-head methods, with new text and an illustrative figure (L240; Appendix L511–528, Figure 12). - We expanded the explanation of the Index-On-LHS Tracing mechanism and clarified it in Figure 10. ## 3. Clarified Broader Implications: Function Representation vs. Token-Routing We clarified the broader implications of our findings by contrasting them with the established function vector perspective. Our probing experiments show that the model does not encode function semantics as explicit decodable vectors. Instead, it implements function composition through a slot–content token-routing mechanism. We now position this as a complementary mechanism for how transformers can generalize compositionally (Abstract L8–13; Introduction L38–44; Discussion L337–345). ## 4. Clarified Scope and Future Directions We clarified that our findings specifically address a form of sequence remapping compositionality and do not claim to account for other forms (Introduction L41–44). Nonetheless, sequence remapping is a core component of many in-context learning behaviors — including word completion, object identification, and code completion. Verifying whether this routing-based mechanism also operates in large real-world LLMs remains an important open question and a priority for our future research. We believe that our workflow and results provide a practical foundation and template for identifying similar mechanisms in more complex models and tasks (Discussion L343-345). **We believe these revisions strengthen the empirical support for our conclusions and make the conceptual contributions more explicit**. We thank the Action Editor again for overseeing this process and look forward to any further feedback.
Assigned Action Editor: ~Elahe_Arani1
Submission Number: 5009
Loading