Dubito, Ergo Sum: Self-Evolving Adaptive Reasoning via Intrinsic-Weighted Group Optimization

Dubito, Ergo Sum: Self-Evolving Adaptive Reasoning via Intrinsic-Weighted Group Optimization

ACL ARR 2026 January Submission4217 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Adaptive Reasoning, Reinforcement Learning, Intrinsic Uncertainty, Efficient Inference

Abstract: Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought prompting but suffer from 'overthinking,' often inefficiently allocating costly reasoning resources to simple queries. Existing adaptive methods typically rely on opaque reinforcement learning strategies based on reasoning-length penalties, which lack both interpretability and intrinsic grounding. We argue that true efficiency stems from a model's internalized self-awareness of task complexity. In this work, we introduce \textbf{Dubito-Pro}, a framework where the LLM autonomously selects between "Fast" and "Slow" thinking modes based on the input context, without external classifiers or inference-time intervention. Our core insight is that Entropy Variance serves as a high-fidelity supervision signal for cognitive struggle during training. To instill this capability, we propose Intrinsic-Weighted Group Relative Policy Optimization (I-GRPO). Unlike standard RL approaches that reward solely outcome correctness, I-GRPO introduces a Cognitive Alignment Reward, computed post-hoc during training. This mechanism penalizes the model for selecting a 'Fast' path on high-variance (ambiguous) queries, effectively teaching it to anticipate its own uncertainty. Extensive experiments on a mixed-difficulty benchmark demonstrate that Dubito-Pro acquires a robust, intuitive switching policy. Reducing token costs by 80\% on simple tasks and improving overall accuracy by 7.47\%, this method establishes a new Pareto frontier for efficiency and accuracy.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: LLM Efficiency,NLP in resource-constrained settings

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 4217

Loading