What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers

Published: 10 Jun 2025, Last Modified: 15 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Abrupt learning, attention map, transformer training dynamics, interpretability, science of language models
TL;DR: Early phase training of Transformers on algorithmic tasks shows a plateau in loss, partial solution, repetition bias and representation collapse before sudden drop in loss.
Abstract: Training Transformers on algorithmic tasks frequently demonstrates an intriguing *abrupt learning* phenomenon: an extended performance plateau followed by a sudden, sharp improvement. This work investigates the underlying mechanisms for such dynamics, primarily in shallow Transformers. We reveal that during the plateau, the model often develops an interpretable *partial solution* while simultaneously exhibiting a strong *repetition bias* in their outputs. This output degeneracy is accompanied by *internal representation collapse*, where hidden states across different tokens become nearly parallel. We further identify the slow learning of optimal attention maps as a key bottleneck. Hidden progress in attention configuration during the plateau precedes the eventual rapid convergence, and directly intervening on attention significantly alters plateau duration and the severity of repetition bias and representational collapse. We validate that these phenomena—repetition bias and representation collapse—are not artifacts of toy setups but also manifest in the early pre-training stage of LLMs like Pythia and OLMo.
Code: ipynb
Submission Number: 27
Loading