T2MLR: Transformer with Temporal Middle-Layer Recurrence

Ziyang Cai; Xingyu Zhu; Yihe Dong; Yinghui He; Sanjeev Arora

T2MLR: Transformer with Temporal Middle-Layer Recurrence

Ziyang Cai, Xingyu Zhu, Yihe Dong, Yinghui He, Sanjeev Arora

Published: 02 Mar 2026, Last Modified: 18 Mar 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: Latent Reasoning, Middle Layer, Continuous Chain-of-Thought, Reasoining, Recurrent Neural Networks, Looped Transformers

TL;DR: We propose T2MLR, a Transformer that adds temporal recurrence at middle layers—feeding the previous token’s intermediate representations into earlier layers of the current token—to improve stateful/reasoning behavior.

Abstract: We introduce Transformers with Temporal Middle-Layer Recurrence (T$^2$MLR), a generalized Transformer architecture that integrates attention and recurrence by routing a lightweight temporal pathway through the middle layers. Motivated by latent-reasoning and looped-Transformer lines of work, T$^2$MLR injects intermediate representations from deeper layers of the previous token into earlier layers of the current token via a gated recurrent pathway, enabling iterative latent computation while preserving dense, token-level supervision. Across natural-language pretraining and multi-hop reasoning finetuning, T$^2$MLR consistently outperforms parameter-matched Transformer baselines at the same inference compute. Moreover, we find that looping only a middle-layer block (as little as 20\% of all layers) often outperforms full-layer looping. This offers a new perspective on latent reasoning in Transformers: effective iterative refinement does not necessarily require full-stack recurrence. It can instead be achieved more effectively through targeted middle-layer recurrence.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 96

Loading