Keywords: chain-of-thought, reasoning, summarization
TL;DR: We can effectively extend the thinking time of reasoning LLMs by iteratively summarizing the model's chain of thought.
Abstract: Training large language models to ``think'' longer by generating chains of thought has led to breakthroughs in their reasoning capabilities.
However, their limited context length is a barrier to scaling their thinking time even further.
We investigate iterated summarization as a practical approach to extend thinking time: models alternate between summarizing lengthy reasoning traces and reasoning about the problem given summaries of previous attempts.
There are many possible summarization strategies, so a key scientific question emerges: what types of summaries effectively compress lengthy reasoning traces?
To study this, we investigate the design space of summarization strategies
and evaluate their performance in the context of iterated summarization. On AIME 2024 \& 2025, our best iterated summarization method, which preserves backtracking behavior, boosts accuracy by 11\% over initial reasoning attempts and significantly surpasses baseline methods of extending test-time compute.
Submission Number: 24
Loading