Two-Scale Latent Dynamics for Recurrent-Depth Transformers

Published: 23 Sept 2025, Last Modified: 17 Nov 2025UniReps2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Extended Abstract Track
Keywords: recurrent depth, test-time compute, loop refinement, two-scale dynamics, early-exit
TL;DR: Recurrent transformers can be seen as producing dual-scale trajectories in latent space. We show the smaller-scale refinement tends to rotate around fixed points. We use this to introduce an early-exit strategy beating others in accuracy and speed.
Abstract: Recurrent-depth transformers scale test-time compute by iterating latent computations before emitting tokens. We study the geometry of these iterates and argue for a simple, \emph{two-scale} operational picture: (i) within a looped block, updates act as \emph{small-scale refinements}; (ii) across consecutive blocks, states undergo a \emph{larger-scale drift}. Across training, our measurements show that loop steps become \emph{smaller} and increasingly \emph{orthogonal} to one another, indicating better local modeling of fine structure rather than merely pushing in a single direction. These dynamics motivate an early-exit mechanism based on the model's second-order difference in step-size, which we show is superior in terms of performance, stability and time-efficiency, when compared to the KL-divergence exit strategy of Geiping et al.~\citep{geiping2025scaling} and its naive first-order counterpart.
Submission Number: 86
Loading