Keywords: Agentic AI, Reinforcement Learning, LLM Reasoning, Multi-turn Interaction
TL;DR: We provide systematic empirical analysis and key takeaways for the factors that are practically important in making multi-turn RL for LLM agent learning work.
Abstract: We study how to train large language models (LLMs) as autonomous agents that act over multiple turns in agentic environments. While reinforcement learning has driven strong single-turn reasoning, extending to multi-turn environments introduce new challenges yet to be addressed. We formulate multi-turn agentic RL with dense per-turn rewards and token-level credit assignment, and provide a systematic analysis of the impacts three RL pillars -- environment, policy, and reward -- on multi-turn RL. Under interactive text environments (TextWorld, ALFWorld), we examine scaling with environment complexity and generalization across tasks; we analyze the role of model priors in subsequent multi-turn RL training; we compare the impact of sparse and dense per-turn rewards on RL learning. We provide an extensible code framework for multi-turn agentic RL. Together, our formulation, analysis, and toolkit offer practical guidance for building LLM agents capable of robust multi-turn decision making in agentic environments.
Submission Number: 127
Loading