Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

Published: 29 May 2026, Last Modified: 31 May 2026HiLD at ICML 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: loss spike, grokking, neural collapse, numerical stability
TL;DR: Late-stage training instabilities, including the Slingshot mechanism and logit divergence, are caused by floating-point absorption errors that induce an exponential parameter growth termed Numerical Feature Inflation.
Abstract: Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. We show that finite-precision errors in cross-entropy computation can break the zero-sum constraint of gradients across classes and introduce a systematic drift in the parameter update of the classifier layer. This drift forms a positive feedback loop with the feature mean, which we call _Numerical Feature Inflation_ ($\mathcal{NFI}$). Our results reinterpret Slingshot as a numerical dynamic of finite-precision training and provide a testable explanation for the emergence of periodic loss spikes in late-stage unregularized training.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 181
Loading