A Mechanistic Study of Transformers Training Dynamics

Ambroise Odonnat; Wassim Bouaziz; Vivien Cabannes

A Mechanistic Study of Transformers Training Dynamics

Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes

Published: 11 Jun 2026, Last Modified: 22 Jun 2026Mech Interp Workshop ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Circuit Analysis, Attribution Graphs, Feature Geometry

Other Keywords: Training Dynamics, Transformers, Visualization

TL;DR: We study the training dynamics in a small transformer model on a mathematical task using a visualization sandbox that help study each layer of the model during the optimization process.

Abstract: Large-scale pretraining of transformers has been central to the success of foundation models. However, the scale of those models limits our understanding of the mechanisms at play during optimization. In this work, we study the training dynamics of transformers in a controlled and interpretable setting. On the sparse modular addition task, we demonstrate that specialized attention circuits, called *clustering heads*, can be implemented during gradient descent to solve the problem. Our experiments show that such pathways naturally emerge during training. By monitoring the evolution of tokens via a visual sandbox, we uncover a two-stage learning and the occurrences of loss spikes due to the high curvature of normalization layers. Our findings provide several insights into patterns observed in more practical settings, such as the pretraining of large language models.

Submission Number: 148

Loading