ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agentic RL
Abstract: Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early re- sults, ARL remains highly unstable, often lead- ing to training collapse. This instability lim- its scalability to larger environments and longer interaction horizons, and constrains systematic exploration of algorithmic design choices. In this paper, we first propose ARLArena, a sta- ble training recipe and systematic analysis frame- work that examines training stability in a con- trolled and reproducible setting. ARLArena first constructs a clean and standardized testbed. Then, we decompose policy gradient into four core design dimensions and assess the perfor- mance and stability of each dimension. Through this fine-grained analysis, we distill a unified per- spective on ARL and propose SAMPO, a sta- ble agentic policy optimization method designed to mitigate the dominant sources of instabil- ity in ARL. Empirically, SAMPO achieves con- sistently stable training and strong performance across diverse agentic tasks. Overall, this study provides a unifying policy gradient perspective for ARL and offers practical guidance for build- ing stable and reproducible LLM-based agent training pipelines.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 180
Loading