ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Keywords: Agentic RL
Abstract: Agentic reinforcement learning (ARL) has
rapidly gained attention as a promising paradigm
for training agents to solve complex, multi-step
interactive tasks. Despite encouraging early re-
sults, ARL remains highly unstable, often lead-
ing to training collapse. This instability lim-
its scalability to larger environments and longer
interaction horizons, and constrains systematic
exploration of algorithmic design choices. In
this paper, we first propose ARLArena, a sta-
ble training recipe and systematic analysis frame-
work that examines training stability in a con-
trolled and reproducible setting. ARLArena
first constructs a clean and standardized testbed.
Then, we decompose policy gradient into four
core design dimensions and assess the perfor-
mance and stability of each dimension. Through
this fine-grained analysis, we distill a unified per-
spective on ARL and propose SAMPO, a sta-
ble agentic policy optimization method designed
to mitigate the dominant sources of instabil-
ity in ARL. Empirically, SAMPO achieves con-
sistently stable training and strong performance
across diverse agentic tasks. Overall, this study
provides a unifying policy gradient perspective
for ARL and offers practical guidance for build-
ing stable and reproducible LLM-based agent
training pipelines.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 180
Loading