TRACE: Adaptive Curtailment of Reasoning in Retrieval-Augmented Generation via Trajectory Reflection

ICLR 2026 Conference Submission16661 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Retrieval-Augmented Generation, Chain-of-Thought, overthinking
Abstract: Large Reasoning Language Models (LRLMs) excel at complex reasoning tasks by generating multi-step chains of thought. However, their autoregressive nature can lead to overthinking, a tendency to generate overly verbose reasoning that inflates computational costs and can even degrade accuracy. Advanced methods mitigate overthinking by monitoring the model's internal confidence and terminating the process once a high-confidence answer is found. This strategy is effective when models reason using their parametric knowledge, but it faces significant challenges and risks failure in Retrieval-Augmented Generation (RAG) scenarios where external knowledge is introduced. We have conducted an in-depth analysis of this issue and reveal that the reasoning process in RAG universally follows a distinct, two-stage Exploratory-Synthesizing pattern. Unlike scenarios that rely solely on parametric knowledge where confidence gradually accumulates, the initial exploration phase in this pattern involving external documents exhibits premature confidence, where models become highly certain after inspecting only partial evidence. This early confidence surge misleads conventional termination methods, causing them to halt the process prematurely and produce incorrect answers. To address this, we propose Trajectory Reflection with Adaptive Curtailment and Exit (TRACE), a training-free framework that employs a cascading check at each reasoning step. First, it monitors the stability of the model's predictive beliefs to ensure sufficient knowledge exploration. Subsequently, it assesses task completion by confirming high confidence in a synthesized final answer. Extensive experiments demonstrate that TRACE reduces token generation by 22\% to 54\% while achieving comparable or superior accuracy to standard Chain-of-Thought prompting.
Primary Area: generative models
Submission Number: 16661
Loading