FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

ACL ARR 2025 May Submission1950 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Improving training efficiency continues to be one of the primary challenges in large-scale Reinforcement Learning (RL). In this paper, we investigate how \textbf{\textit{context length}} and \textbf{\textit{the complexity of training data}} influence \textit{the scaling RL training process of R1-distilled small reasoning models, e.g., DeepSeek-R1-Distill-Qwen-1.5B}. \textit{Our experimental results reveal that:} {\color{text-green}\textit{(1) simply controlling the context length and curating the training data based on the input prompt length can effectively improve the training efficiency of scaling RL, achieving better performance with more concise CoT;}} {\color{text-blue}\textit{(2) properly scaling the context length helps mitigate entropy collapse;}} {\color{text-red}\textit{and (3) choosing an optimal context length can improve the efficiency of model training and incentivize the model's chain-of-thought reasoning capabilities}}. Inspired by these insights, we propose \textbf{\textsc{FastCuRL}}, a curriculum RL framework with stage-wise context scaling to achieve efficient training and concise CoT reasoning. Experiment results demonstrate that \textbf{\textsc{FastCuRL}-1.5B-V3} significantly outperforms state-of-the-art reasoning models on five competition-level benchmarks and achieves 49.6\% accuracy on AIME 2024. Furthermore, \textbf{\textsc{FastCuRL}-1.5B-Preview} surpasses DeepScaleR-1.5B-Preview on five benchmarks while only using a single node with 8 GPUs and a total of 50\% of training steps. The code, training data, and models will be publicly released.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Large Language Models, Reinforcement Learning
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Keywords: Large Language Models, Reinforcement Learning
Submission Number: 1950
Loading