More Yap Less Meaning: Uncovering Self-Improvement Behavior in LLMs

ACL ARR 2026 March Submission1387 Authors

16 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Self-Correction, Reasoning, Faithfulness
Abstract: Recently, Large Language Models (LLMs) have made rapid progress across various domains and applications. However, their capability for self-improvement, i.e., whether they are adept at recognising and correcting flaws in their own reasoning, remains dubious. In this study, we address this question by constructing a test of sufficiency to rigorously examine LLMs’ self-correction capabilities. We propose a minimal three-step self-correction pipeline that collects initial LLM answers, prompts the same model to generate hints for its incorrect responses given the ground truth, and feeds the model the same question with its own feedback to refine the initial answer. We evaluate a variety of instruction-tuned and reasoning Language Models in this experimental setup on arithmetic and logical reasoning benchmarks. Our findings show that LLMs with injected hint sentences yield only a $4.4\$% gain over initial question-answering accuracy. Even though the correct answer was provided alongside the model’s incorrect reasoning, Language Models fail to understand what was missing in their reasoning and show minimal semantic difference between hints that lead to corrections and ones that do not. Furthermore, our experiments show that longer hints are positively correlated with incorrect final answers, suggesting that longer deliberation on problems can hinder the reasoning process, meaning that LLMs do not necessarily scale in performance with a larger compute budget.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Robustness
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1387
Loading