Adaptive Curriculum Strategies: Stabilizing Reinforcement Learning for Large Language Models

ICLR 2026 Conference Submission16430 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mathematical Reasoning;Large Language Models;Reinforcement Learning
Abstract: Curriculum learning has shown promise for enhancing Large Language Models (LLMs) through progressive difficulty management, yet existing approaches suffer from instability issues when applied to reinforcement learning paradigms. Existing curriculum-based RL training exhibits catastrophic performance collapse during difficulty transitions, particularly when models encounter samples beyond their current capabilities. This instability stems from rigid curriculum designs that fail to adapt to individual model characteristics and learning trajectories. To address these limitations, we propose Adaptive Curriculum Strategies (ACS), a framework that promotes stable and effective training throughout curriculum progression. Our approach introduces model-specific difficulty calibration that adapts to each model's capabilities, and ``Guided Prompting'' that transforms challenging samples to prevent training instability. Experiments demonstrate that ACS prevents performance collapse in traditional curriculum RL training, achieving substantial improvements across five mathematical reasoning benchmarks while enhancing training stability.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16430
Loading