Loop the Middle: Adaptive Depth Transformers via Selective Middle-Layer Recurrence

Lechen Zhang; Jiarui Liu; Zhijing Jin; Mona T. Diab

Loop the Middle: Adaptive Depth Transformers via Selective Middle-Layer Recurrence

Lechen Zhang, Jiarui Liu, Zhijing Jin, Mona T. Diab

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: large language models, reasoning, test-time compute, efficienct LLM, transformers

Abstract: Transformers process all layers uniformly despite empirical evidence that they develop a natural three-phase structure: specialized encoding layers, highly redundant middle layers, and specialized decoding layers. We propose Prefix-Loop-Suffix (PLS), an architecture that exploits this structure by partitioning a pretrained transformer into unique prefix and suffix layers (run once) and a weight-shared middle block (looped N times with adaptive per-token halting). Three enhancements distinguish PLS from prior looped architectures: a hazard-function exit gate that lets easy tokens halt early and hard tokens iterate longer; timestep encoding that closes the expressivity gap between looped and standard transformers; and cross-iteration residual connections that stabilize iterative refinement. Initialized from Ouro-1.4B and evaluated on MATH-500 and GSM8K, PLS targets 20-40% fewer FLOPs than uniform full-model looping while matching or exceeding its accuracy, since prefix and suffix layers execute only once. We further provide CKA-grounded analysis validating the three-phase hypothesis, probe hidden states for convergence dynamics, and characterize exit-gate behavior as a function of problem difficulty.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 184

Loading