Keywords: LLM, Efficienct Reasoning
Abstract: Standard Large Language Models (LLMs) continuously accumulate tokens in their reasoning chains but lack a mechanism to release information that is no longer necessary for the final answer. This accumulation can populate the context window with redundant content, such as dead-end paths or transient verification steps, which can distract the attention mechanism and impede the coherence of long-form reasoning.
In this paper, we introduce Free()LM, an architecture that integrates a \texttt{free()} function to actively manage reasoning context. We augment the base model with a lightweight, trainable Free-Module. During generation, this module is activated at regular intervals to output structured commands that identify and remove redundant segments of the reasoning trace. By dynamically pruning the context, Free()LM maintains a managed workspace throughout the inference process.
Empirical results demonstrate that the Free-Module significantly enhances reasoning performance. Across six long-reasoning benchmarks, Free()LM improves Qwen3-8B and Qwen3-30B-A3B by an average of 4.4%. On Qwen3-235B-A22B, it yields an 11% relative gain on the Hard LLM Evaluation (HLE) benchmark. Notably, on complex instances requiring over 70k thinking tokens, Free()LM increases the accuracy of Qwen3-235B-A22B from 0% to 28%. These performance gains are accompanied by improved efficiency, as the approach reduces KV cache memory usage on HLE from 6.14GB to 3.34GB per sample. Our findings suggest that effective long-form reasoning depends not only on information retention but also on the strategic removal of redundant context.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: chain-of-thought, robustness
Languages Studied: English
Submission Number: 874
Loading