Abstract: Large language models (LLMs) have demonstrated remarkable reasoning abilities through extensive test-time inference. However, such deep and lengthy reasoning frequently results in substantial computational overhead. Current methods either uniformly minimize reasoning tokens, thereby neglecting the necessity for more intricate reasoning on complex tasks, or employ precise token-level control, which often hinges on accurate difficulty estimation and suffers from unreliable model interpretation for nuanced instructions. To address these limitations, we introduce AdaCtrl, a novel framework that can dynamically adjust its reasoning length based on the model’s self-assessed problem difficulty and also allow human-in-the-loop control of the budget to prioritize either efficiency or effectiveness. Specifically, we carefully develop a two-stage training pipeline: 1) Cold-start fine-tuning stage, where we first design explicit difficulty-aware tags (e.g., ``[Easy]'' or ``[Hard]'') to indicate difficulty of problems, and train the model on a curated dataset to align its reasoning behavior with these difficulty levels; and 2) Difficulty-aware reinforcement learning stage, which further refines the model’s adaptive reasoning behavior and calibrates its self-assessment of problem difficulty. In this way, AdaCtrl not only empowers the model to adaptively assess the difficulty of problem and adjust reasoning budget allocation, but also enables the user to explicitly control the desired reasoning mode by injecting the specific difficulty-aware tag. Empirical results across four benchmarks show that, compared to different types of baselines, AdaCtrl effectively balances performance and computational efficiency, leading to performance improvements while dynamically reducing response lengths by up to 90%.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Frederic_Sala1
Submission Number: 6723
Loading