PRIME-RL: Async & Decentralized RL Training at Scale

Published: 22 Sept 2025, Last Modified: 25 Nov 2025ScaleOPT PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Async RL, Decentralized RL, Reinforcement Learning with Verifiable Rewards
TL;DR: PRIME-RL is an open-source, scalable reinforcement learning framework optimized for agentic RL and multi-turn tool use in decentralized settings.
Abstract: We present PRIME-RL, an open-source framework for large-scale reinforcement learning (RL). PRIME-RL is designed to scale seamlessly from a single node to thousands of GPUs, making it suitable for tinkering, research, and production-scale training. Tailored for agentic RL, it offers first-class support for multi-turn interactions and tool use through its asynchronous architecture. Environments are constructed using the verifiers library and integrated with the Environment Hub, enabling environment development and sourcing to remain fully decoupled from the training infrastructure. To demonstrate the capabilities of PRIME-RL, we train DeepSeek-R1-Distill-Qwen-32B on chain-of-thought (CoT) math reasoning using 24 NVIDIA H200 GPUs. We measure up to 30K tokens per second in aggregated throughput and reach a peak Model FLOPs Utilization (MFU) of 38.46\%.
Submission Number: 30
Loading