Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

Artemii Miasoedov; Timofey Brayko; Rustam A. Lukmanov

Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

Artemii Miasoedov, Timofey Brayko, Rustam A. Lukmanov

Published: 05 Mar 2026, Last Modified: 13 Mar 2026ICLR 2026 Workshop RSI ShortPaperEveryoneRevisionsCC BY 4.0

Keywords: Recursion, Jigsaw, TRM

TL;DR: Tiny Recursive Models solve larger jigsaw puzzles more reliably than size-matched encoder-only Transformers by iteratively refining a latent state

Abstract: Chain-of-Thought (CoT) has demonstrated that explicit reasoning steps enhance large language model performance, yet this typically requires computationally expensive token sequences. In this work, we investigate Tiny Recursive Models (TRM), which internalize reasoning via iterative refinement of a latent "thought" vector. We benchmark accuracy of TRM against standard encoder-only Transformers (EOT) on the task of Jigsaw Puzzle reconstruction, a domain requiring robust global spatial reasoning. While both architectures perform comparably on trivial grids up to $3 \times 3$, EOT performance collapses as complexity increases. In contrast, TRM maintains robust scaling with tight parameter budget. Furthermore, TRMs exhibit "abrupt learning" phase transitions during training, suggesting that latent recursion enables a qualitative leap in reasoning depth unattainable by simply stacking transformer layers.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 90

Loading