Deciphering Self-Improvement: Large Language Models Can Take False First Steps

ICLR 2026 Conference Submission17982 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, self-improvement, planning, Bayesian, MCMC
Abstract: One of the most striking capabilities of Large Language Models (LLMs) is their apparent ability to refine outputs through a process of self-improvement. Yet how an autoregressive model acquires such capability remains unclear. We propose a mechanistic model of LLM self-improvement grounded in a Bayesian perspective on token generation. In this view, LLMs maintain latent plans for future token generations, which gradually stabilize as more self-generated tokens are autoregressively incorporated back into the context. Across two single-dimensional random number generation experiments, we find evidence consistent with the dynamic patterns of planning and generation predicted by this Bayesian model. Building on these insights, we introduce self-play Markov Chain Monte Carlo (spMCMC), an extension of MCMC-with-LLMs designed to elicit reliable reward signals for self-improvement in open-ended text generation. Human evaluations show that spMCMC identifies higher-quality outputs that are often overlooked by both greedy decoding and other self-evaluation methods.
Primary Area: applications to neuroscience & cognitive science
Submission Number: 17982
Loading