Keywords: reasoning, llms
TL;DR: This paper shows that reformatting reasoning traces into explicit scratch work and conclusion blocks improves LLM reasoning performance and enables pruning mechanisms that reduce context length while preserving accuracy.
Abstract: Large language models (LLMs) excel at generating long chains of thought, but long reasoning traces are often verbose and memory-inefficient. In this work, we introduce $\textit{Structured Thoughts}$, a framework that organizes reasoning into alternating $\texttt{\<try\>}$ and $\texttt{\<outcome\>}$ blocks: $\texttt{\<try\>}$ captures exploratory scratch work, while $\texttt{\<outcome\>}$ contains the distilled conclusion of that step. We construct a dataset of structured thoughts by segmenting reasoning traces into $\texttt{\<try\>}$ blocks and prompting an LLM to summarize each step into its corresponding $\texttt{\<outcome\>}$. Fine-tuning pretrained foundation models on this reformatted data produces models that adopt the structured reasoning style, leading to performance gains of up to 8.08\% on reasoning benchmarks compared to standard SFT. The explicit structure also enables context pruning: after each $\texttt{\<try\>/\<outcome\>}$ pair, the $\texttt{\<try\>}$ can be pruned, allowing the model to retain conclusions without keeping the full scratch work in the context. A proof-of-concept pruning implementation achieves an average of 85\% memory / context savings with an 8.67\% performance drop across mathematical tasks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19652
Loading