Adapting Dynamic Sampling to Fine-Grained Rewards for Tool Learning with Curriculum Learning

ACL ARR 2026 January Submission5198 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tool Learning, Reinforcement Learning, Dynamic Sampling, Curriculum Learning
Abstract: Dynamic sampling is a technique employed in reinforcement Learning for Large Language Models to mitigate training instability, often caused by the gradient-decreasing problem. However, existing dynamic sampling approaches are predominantly designed for tasks with binary rewards (success/failure) and struggle to adapt to the complex reward structures of tool learning. Such task involves interdependent sub-tasks of varying difficulty and yields fine-grained, multi-faceted rewards. To bridge this gap, we introduce Dynamic Sampling with Curriculum Learning (DSCL), an algorithm tailored for the intricate dynamics of tool learning. DSCL integrates two core components: a Reward-Based Dynamic Sampler, which leverages multi-dimensional reward statistics to prioritize high-value training data; and a Task-Based Dynamic Curriculum Learning method to overcome the credit assignment problem caused by the fine-grained rewards. Extensive experiments on widely used tool learning benchmarks demonstrate the efficacy of our approach. DSCL significantly improves model performance, outperforming strong baselines by 4.75\% on the BFCL V3 dataset and 4.02\% on the API-Bank dataset. Our method will be publicly available at http://anonymous.com/.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: AI/LLM Agents
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 5198
Loading