A Probabilistic Inference Scaling Theory for LLM Self-Correction

ACL ARR 2025 February Submission3488 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have demonstrated the capability to refine their generated answers through self-correction, enabling continuous performance improvement over multiple rounds. However, the mechanisms underlying how and why accuracy evolves during this iterative process remain unexplored. To fill this gap, we propose a probabilistic theory to model the dynamics of accuracy change and explain the performance improvements observed in multi-round self-correction. Through mathematical derivation, we establish that the accuracy after the $t^{th}$ round of self-correction is given by: $Acc_t = Upp - \alpha^t(Upp - Acc_0),$ where $Acc_0$ denotes the initial accuracy, $Upp$ represents the upper bound of accuracy convergence, and $\alpha$ determines the rate of convergence. Based on our theory, these parameters can be calculated and the predicted accuracy curve then can be obtained through only a single round of self-correction. Extensive experiments across diverse models and datasets demonstrate that our theoretical predictions align closely with empirical accuracy curves, validating the effectiveness of the theory. Additionally, we derive and experimentally verify three corollaries, further substantiating the theory. Finally, we discuss failure scenarios, bottlenecks, and the potential of self-correction from the perspective of our theory. Our work provides a theoretical foundation for understanding LLM self-correction, thus paving the way for further explorations.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: self-correction, inference scaling, theory
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 3488
Loading