PRIME-RL: Async & Decentralized RL Training at Scale

Mika Senghaas; Fares Obeid; Sami Jaghouar; William Brown; Jack Min Ong; Andrew Baker; Justus Mattern; Daniel Auras; Jannik Straube; Manveer Basra; Aiman Ismail; Johannes Hagemann

PRIME-RL: Async & Decentralized RL Training at Scale

Mika Senghaas, Fares Obeid, Sami Jaghouar, William Brown, Jack Min Ong, Andrew Baker, Justus Mattern, Daniel Auras, Jannik Straube, Manveer Basra, Aiman Ismail, Johannes Hagemann

Published: 22 Sept 2025, Last Modified: 25 Nov 2025ScaleOPT PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Async RL, Decentralized RL, Reinforcement Learning with Verifiable Rewards

TL;DR: PRIME-RL is an open-source, scalable reinforcement learning framework optimized for agentic RL and multi-turn tool use in decentralized settings.

Abstract: We present PRIME-RL, an open-source framework for large-scale reinforcement learning (RL). PRIME-RL is designed to scale seamlessly from a single node to thousands of GPUs, making it suitable for tinkering, research, and production-scale training. Tailored for agentic RL, it offers first-class support for multi-turn interactions and tool use through its asynchronous architecture. Environments are constructed using the verifiers library and integrated with the Environment Hub, enabling environment development and sourcing to remain fully decoupled from the training infrastructure. To demonstrate the capabilities of PRIME-RL, we train DeepSeek-R1-Distill-Qwen-32B on chain-of-thought (CoT) math reasoning using 24 NVIDIA H200 GPUs. We measure up to 30K tokens per second in aggregated throughput and reach a peak Model FLOPs Utilization (MFU) of 38.46\%.

Submission Number: 30

Loading