Structure Development in List Sorting Transformers

Published: 23 Oct 2024, Last Modified: 24 Feb 2025NeurReps 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Developmental Interpretability, Copy Suppression, Head Specialization
TL;DR: List sorting transformer trained on list sorting showing head specialization and copy-suppression.
Abstract: We present an analysis of the evolution of the QK and OV circuits for a list sorting attention only transformer. Using various measures, we identify the developmental stages in the training process. In particular, we find two forms of head specialization later in the training: vocabulary-splitting and copy-suppression. We study their robustness by varying the training hyperparameters and the model architecture.
Submission Number: 26
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview