Keywords: Probing, Understanding high-level properties of models, Other
TL;DR: Large language models (LLMs) exhibit rare token neurons
Abstract: Large language models (LLMs) struggle with representing and generating rare tokens despite their importance in specialized domains. In this study, we identify a reproducible \textit{three-regime organization} of neuron influence in the final MLP layer for rare-token prediction, composed of: (i) a highly influential plateau of specialist neurons, (ii) a power-law regime of moderately influential neurons, and (iii) a rapid decay regime of minimally contributing neurons. We show that neurons in the plateau and power-law regimes form coordinated subnetworks with distinct geometric and co-activation patterns, while exhibiting heavy-tailed weight distributions, consistent with predictions from Heavy-Tailed Self-Regularization (HT-SR) theory. These specialized subnetworks emerge dynamically during training, transitioning from a homogeneous initial state to a functionally differentiated architecture. Our findings reveal that LLMs spontaneously combine distributed power-law sensitivity with specialized rare-token processing. This mechanistic insight bridges theoretical predictions from sparse coding and superposition with empirical observations, offering pathways for interpretable model editing, efficiency optimization, and deeper understanding of emergent specialization in deep networks.
Submission Number: 320
Loading