Let's Let's Let's Let's... Understand Looping in Reasoning Models

ICLR 2026 Conference Submission21639 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reasoning models, looping, llms, inference-time compute, learning theory
TL;DR: Imperfect learning triggers endless repetitions in reasoning mn
Abstract: Reasoning models (e.g., DeepSeek-R1) use extra inference-time compute to write long chains of thought and solve harder problems. Yet they often loop---repeating the same text---especially at low temperatures or with greedy decoding. We take a step toward understanding why. We evaluate several open reasoning models and see looping is common at low temperatures. Within a family, higher capacity models loop less and for distilled models, the student loops far more even when the teacher rarely does. This points to imperfect learning or errors in learning as a key cause. We then demonstrate two ways errors in learning can cause loops, using a simple graph-traversal setup. First, when the correct next action is hard to learn but an easy cyclic action is available, the model puts relatively more probability on the easy action and gets stuck. Second, errors across time steps in a chain of thought can be correlated, which drives repetition. Finally, we discuss potential avenues for reducing looping and implications beyond looping.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21639
Loading