Keywords: Curriculum Learning, Reasoning, Reinforcement Learning, Efficiency
TL;DR: We accelerate LLM training with curriculum learning
Abstract: Training large language models with reinforcement learning (RL) against verifiable rewards significantly enhances their reasoning abilities, yet remains computationally expensive due to inefficient uniform prompt sampling. We introduce **Selective Prompting with Efficient Estimation of Difficulty (SPEED)**, an adaptive online RL curriculum that selectively chooses training examples of intermediate difficulty to maximize learning efficiency. Theoretically, we establish that intermediate-difficulty prompts improve the gradient estimator’s signal-to-noise ratio, accelerating convergence. Empirically, our procedure leads to 2× to 6× faster training without degrading accuracy, requires no manual tuning, and integrates seamlessly into standard RL algorithms.
Submission Number: 29
Loading