ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning

ACL ARR 2025 May Submission4807 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs, increasing computational overhead. Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to remove redundant content thoroughly. To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs—$\textit{Confidence Deficit}$, wherein the model reflects on correct intermediate steps, and $\textit{Termination Delay}$, where reflection continues after a verified, confident answer—through a confidence-guided perspective. Based on this, we introduce $\textbf{ConCISE}$ ($\textbf{Con}$fidence-guided $\textbf{C}$ompression $\textbf{I}$n $\textbf{S}$tep-by-step $\textbf{E}$fficient Reasoning), a framework designed to generate concise reasoning chains, integrating $\textit{Confidence Injection}$ to boost reasoning confidence, and $\textit{Early Stopping}$ to terminate reasoning when confidence is sufficient. Extensive experiments demonstrate that compared to baseline methods, fine-tuning LRMs on $\textbf{ConCISE}$-generated data yields a better balance between compression and task performance, reducing length by up to $\textasciitilde50\%$ under SimPO, while maintaining high task accuracy.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: chain-of-thought, fine-tuning, data influence, math QA
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 4807
Loading