Enhancing Reasoning in Large Language Models via Entropy-Aware Self-Evolution

Enhancing Reasoning in Large Language Models via Entropy-Aware Self-Evolution

ICLR 2026 Conference Submission17203 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model, self-evolution, math reasoning

TL;DR: We propose an entropy-aware self-evolution framework that leverages verifier feedback and entropy-guided strategies to enhance both correctness and exploration in LLM reasoning.

Abstract: Large language models (LLMs) have exhibited remarkable reasoning capabilities. However, when self-evolution frameworks are employed to further enhance these models, a key challenge lies in balancing correctness, which ensures reliable supervision, and exploration, which promotes diverse reasoning trajectories. To address this dilemma, we propose an $\textbf{entropy-aware self-evolution framework}$ that integrates verifier feedback with both sequence-level and token-level entropy. Our approach incorporates two key strategies: (i) $\textbf{high-entropy selection}$ of verified trajectories to provide informative yet reliable signals; and (ii) $\textbf{entropy-aware rethinking}$, which revisits uncertain reasoning steps to uncover alternative solutions. Theoretically, we establish the connection between entropy and the expected supervised fine-tuning loss, showing that high-entropy trajectories yield stronger learning signals. Empirically, experiments across multiple reasoning benchmarks demonstrate that our framework consistently improves both reliability and exploratory capacity over strong baselines. With the assistance of the proposed framework, InternLM2.5-1.8B achieves an improvement of $8.27$% and surpasses the strong baseline by $1.82$% on the GSM8K task, as measured by $Pass@16$. Our results highlight entropy as a principled driver of self-improvement, enabling LLMs to evolve toward models that are not only more accurate but also more exploratory.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17203

Loading