AdaCtrl: Towards Adaptive and Controllable Reasoning via  Difficulty-Aware Budgeting

AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting

TMLR Paper6723 Authors

30 Nov 2025 (modified: 01 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the advent of test-time scaling, Large Reasoning Models have achieved remarkable performance. However, the reinforcement learning process used to unlock these capabilities often leads to uncontrolled generation length, resulting in substantial computational overhead and unnecessary "overthinking" on simple tasks. Current methods either uniformly minimize reasoning tokens, thereby neglecting the necessity for more intricate reasoning on complex tasks, or employ precise token-level control, which often hinges on accurate difficulty estimation and suffers from unreliable model interpretation for nuanced instructions. To address these limitations, we introduce AdaCtrl, a novel framework that can dynamically adjust its reasoning length based on the model’s self-assessed problem difficulty and also allow human-in-the-loop control of the budget to prioritize either efficiency or effectiveness. Specifically, we carefully develop a two-stage training pipeline: 1) Cold-start fine-tuning stage, where we first design explicit difficulty-aware tags (e.g., ``[Easy]'' or ``[Hard]'') to indicate difficulty of problems, and train the model on a curated dataset to align its reasoning behavior with these difficulty levels; and 2) Difficulty-aware reinforcement learning stage, which further refines the model’s adaptive reasoning behavior and calibrates its self-assessment of problem difficulty. In this way, AdaCtrl not only empowers the model to adaptively assess the difficulty of problem and adjust reasoning budget allocation, but also enables the user to explicitly control the desired reasoning mode by injecting the specific difficulty-aware tag. Empirical results across four benchmarks show that, compared to different types of baselines, AdaCtrl effectively balances performance and computational efficiency, leading to performance improvements while dynamically reducing response lengths by up to 90%.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Frederic_Sala1

Submission Number: 6723

Loading