Loop the Middle: Adaptive Depth Transformers via Selective Middle-Layer Recurrence
Keywords: large language models, reasoning, test-time compute, efficienct LLM, transformers
Abstract: Transformers process all layers uniformly despite empirical evidence that they develop a natural three-phase structure: specialized encoding layers, highly redundant middle layers, and specialized decoding layers. We propose Prefix-Loop-Suffix (PLS), an architecture that exploits this structure by partitioning a pretrained transformer into unique prefix and suffix layers (run once) and a weight-shared middle block (looped N times with adaptive per-token halting). Three enhancements distinguish PLS from prior looped architectures: a hazard-function exit gate that lets easy tokens halt early and hard tokens iterate longer; timestep encoding that closes the expressivity gap between looped and standard transformers; and cross-iteration residual connections that stabilize iterative refinement. Initialized from Ouro-1.4B and evaluated on MATH-500 and GSM8K, PLS targets 20-40% fewer FLOPs than uniform full-model looping while matching or exceeding its accuracy, since prefix and suffix layers execute only once. We further provide CKA-grounded analysis validating the three-phase hypothesis, probe hidden states for convergence dynamics, and characterize exit-gate behavior as a function of problem difficulty.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 184
Loading