Structure Development in List-Sorting Transformers

TMLR Paper3613 Authors

01 Nov 2024 (modified: 11 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present an analysis of the evolution of the attention head circuits for a list-sorting attention-only transformer. Through various measures, we identify distinct developmental stages in the training process. In particular, depending on the training setup, we find that the attention heads can specialize into one of two different modes: Vocabulary-splitting or copy-suppression. We study the robustness of these stages by systematically varying the training hyperparameters, model architecture and training dataset. This leads usto discover features in the training data that are correlated with the kind of head specialization the model acquires.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Laurent_Charlin1
Submission Number: 3613
Loading