A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Ruiyi Wang; Prithviraj Ammanabrolu

A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

Ruiyi Wang, Prithviraj Ammanabrolu

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: Agentic AI, Reinforcement Learning, LLM Reasoning, Multi-turn Interaction

TL;DR: We provide systematic empirical analysis and key takeaways for the factors that are practically important in making multi-turn RL for LLM agent learning work.

Abstract: We study how to train large language models (LLMs) as autonomous agents that act over multiple turns in agentic environments. While reinforcement learning has driven strong single-turn reasoning, extending to multi-turn environments introduce new challenges yet to be addressed. We formulate multi-turn agentic RL with dense per-turn rewards and token-level credit assignment, and provide a systematic analysis of the impacts three RL pillars -- environment, policy, and reward -- on multi-turn RL. Under interactive text environments (TextWorld, ALFWorld), we examine scaling with environment complexity and generalization across tasks; we analyze the role of model priors in subsequent multi-turn RL training; we compare the impact of sparse and dense per-turn rewards on RL learning. We provide an extensible code framework for multi-turn agentic RL. Together, our formulation, analysis, and toolkit offer practical guidance for building LLM agents capable of robust multi-turn decision making in agentic environments.

Submission Number: 127

Loading