Discovering and Leveraging Entropy-Complexity Relationships for Efficient Large Language Model Reasoning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model; Efficent Reasoning; Entropy; Overthinking
Abstract: Large Language Models (LLMs) suffer from "overthinking" — generating excessive reasoning chains even for simple problems, leading to computational inefficiency and potential accuracy degradation. To address this inefficiency, we systematically show that response entropy serves as an effective intrinsic measure of problem complexity in LLM reasoning. Our key insight is that token-level entropy during generation provides a principled signal for complexity assessment, where low entropy indicates high confidence suitable for direct answers, while high entropy signals the need for detailed reasoning. Building on this entropy-complexity relationship, we propose a novel two-stage training framework for adaptive reasoning. In Stage 1, we use Supervised Fine-Tuning (SFT) on NoThinking exemplars (concise direct answers without explicit reasoning) to endow the model with concise answering capability. In Stage 2, we perform offline Proximal Policy Optimization (PPO) with an entropy-aware reward function to train models to dynamically select between concise and full reasoning modes based on problem complexity. This offline approach offers greater stability and efficiency compared to online RL methods. Experiments on MATH500, AIME24 and GPQA benchmarks demonstrate that our method significantly reduces response length while maintaining accuracy, validating entropy as both a diagnostic tool and training signal for efficient LLM reasoning. Our code is available via https://anonymous.4open.science/r/Efficient-Reasoning-8BA6.
Primary Area: generative models
Submission Number: 11598
Loading