"Trust Yourself": Unsupervised Self-Evolution of Reasoning through Model-Intrinsic Verification

ACL ARR 2026 January Submission8245 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: self-evolving, Entropy, Unsupervised
Abstract: Self-evolving reasoning aims to enable large language models (LLMs) to improve their reasoning capabilities through iterative feedback rather than repeated retraining. However, existing approaches remain limited by their reliance on external supervision or by rigid memory mechanisms. These limitations hinder scalable self-evolution in truly unsupervised settings. We propose \textbf{SEER}, an \emph{unsupervised} \textbf{S}elf-\textbf{E}volution framework for \textbf{E}ntropy-aware \textbf{R}easoning that enables LLMs to self-improve using only intrinsic signals. SEER treats reasoning as a sampling process and employs MCMC-based exploration to generate diverse reasoning trajectories. It then applies a dual-stage, label-free filtering mechanism that distills compact key-point experiences and retains them only when they increase self-consistency and reduce predictive entropy. The verified experiences are stored in a persistent memory bank. During inference, we monitor uncertainty via entropy and dynamically retrieves memories only when sustained uncertainty is detected, enabling selective and low-noise memory injection. Extensive experiments show that SEER consistently outperforms strong reasoning baselines. These results indicate that effective self-evolving reasoning can be achieved {without} gold labels, human feedback, or reward models, highlighting SEER as a step toward scalable, training-free, unsupervised self-evolution in LLMs.
Paper Type: Long
Research Area: Language Models
Research Area Keywords: Generation;Language Modeling
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 8245
Loading