AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu; Jiaxuan Gao; Shusheng Xu; Zhiyu Mei; Chen Zhu; Xujie Shen; Chuyi He; Guo Wei; Jun Mei; WANG JIASHU; Tongkai Yang; Binhang Yuan; Yi Wu

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Shusheng Xu, Zhiyu Mei, Chen Zhu, Xujie Shen, Chuyi He, Guo Wei, Jun Mei, WANG JIASHU, Tongkai Yang, Binhang Yuan, Yi Wu

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo III SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: distributed system, reinforcement learning, large language model

TL;DR: We implement an efficient asynchronous RL system for LLM reasoning by algorithm-system co-design.

Abstract: Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency, as generation must wait until the longest output in the batch is completed before model update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57× training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance.

Submission Number: 44

Loading