Parcae: A Dynamical Systems Perspective to Stable Looped LLMs

Published: 02 Mar 2026, Last Modified: 18 Mar 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: Recurrent Depth Models, Dynamical Systems, Test-Time Compute Scaling, Universal Transformers, Language Models
Abstract: Traditional fixed-depth architectures scale performance by increasing training FLOPs, typically through expanded parameterization at the expense of higher memory footprints. Recent work has suggested a promising alternative — looped architectures — which increases FLOPs while keeping parameters constant by looping layers in the model. While initial results are promising, existing recipes for training looped architectures can be unstable, suffering from residual explosion and loss spikes. To address these challenges, we recast layer looping as a nonlinear time-variant dynamical system. Linearizing this system, we observe strict theoretical conditions for stability on the parameters of the system. With these conditions in mind, we propose Parcae, a novel looped Transformer that enforces stability on the looped layers. When trained on the same data, Parcae converges up to 1.25x faster than prior looped models, while reducing downstream perplexity by up to 27% and 64% in comparison to similarly-sized prior looped models and transformers, respectively. Finally, we investigate recurrence as a medium to increase FLOPs, observing a new scaling axis in training and providing a means of scaling test-time compute and improving quality with constant memory overhead.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 50
Loading