AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo III SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: distributed system, reinforcement learning, large language model
TL;DR: We implement an efficient asynchronous RL system for LLM reasoning by algorithm-system co-design.
Abstract: Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency, as generation must wait until the longest output in the batch is completed before model update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57× training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance.
Submission Number: 44
Loading