Keywords: Large reasoning model, draft–summary paradigm, context refinement
Abstract: Large language models (LLMs) have demonstrated strong reasoning abilities, driven in part by reinforcement learning–based optimization methods such as Reinforcement Learning with Verifiable Rewards (RLVR). These methods encourage a slow thinking paradigm, where models produce detailed intermediate steps between designated “thinking tokens.” Models employing such strategies are commonly referred to as Large Reasoning Models (LRMs). Despite notable progress, LRMs lack selective retention—the ability to discard redundant reasoning while preserving only the intermediate results that are useful for subsequent steps. In contrast, humans dynamically maintain a structured ''working state'', continuously filtering out unproductive thoughts and retaining essential ones. To examine this limitation, we introduce a lightweight arithmetic benchmark designed to isolate reasoning behaviors. Through systematic evaluation, we show that redundant intermediate traces—specifically those that are low-quality or irrelevant—significantly degrade performance.. Analyses of task accuracy, behavioral patterns, and attention allocation confirm that LRMs often rely on the entire reasoning context, including outdated or unhelpful information. To address this issue, we propose Dynamic Context Refinement (D-Refine), an inference-time mechanism that selectively organizes and condenses reasoning steps as they are generated. Experiments on diverse benchmarks demonstrate consistent performance gains, highlighting the importance of maintaining a well-structured working state for accurate and efficient reasoning. This work establishes selective retention as a key principle for improving LRM reasoning.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 23876
Loading