Awakening LLMs’ Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception

Awakening LLMs’ Reasoning Potential: A Fine-Grained Pipeline to Evaluate and Mitigate Vague Perception

ACL ARR 2026 January Submission859 Authors

25 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Evaluation, Reasoning

Abstract: Large language models (LLMs) are increasingly trained to abstain from difficult questions by answering ${unknown}$. However, we observe that LLMs often misuse this option: they output $unknown$ even when they can actually solve the questions, or they fail to recognize when questions are truly unsolvable. We formalize this mismatch between potential ability and the inclination to abstain as the ${Vague Perception}$ phenomenon. We introduce the ${WakenLLM}$ pipeline, which (1) extracts ${Vague Perception}$ samples and (2) measures how many of them can be converted into correct answers under stimulation. Based on stage-wise metrics (e.g., TCR, OCR) and the upper-bound accuracy $\mathrm{Acc}_{WakenLLM}$, we quantify LLMs' reasoning potential beyond one-shot accuracy. Experiments on six LLMs suggest that, without further training or parameter updates, LLMs can achieve up to a 68.53\% accuracy increase on ${Vague Perception}$ samples through our pipeline. We further analyze how ${Vague Perception}$, ${Conformity}$, and \emph{Degradation} vary across model families and parameter scales, and provide model selection strategies for multi-stage reasoning workflows. Finally, by comparing ${WakenLLM}$ with mainstream reasoning baselines, both training-based and non-training ones, we show that existing baselines activate only a small portion of LLMs' reasoning potential, highlighting perception-aware reasoning as a promising direction for future LLM design. Code and datasets are available at https://anonymous.4open.science/r/WakenLLM-toolkit-018B.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: evaluation, reasoning

Contribution Types: Reproduction study, Data analysis

Languages Studied: English

Submission Number: 859

Loading