Keywords: Multi-turn RL, LLM Agents
Abstract: Finetuning large language model (LLM) agents with multi-turn reinforcement learning (RL) is a promising direction. However, applying multi-turn RL to agentic tasks presents unique challenges not typically encountered in reasoning tasks such as solving math problems. These include long interaction histories that hinder relevant context retrieval, sparse rewards that slow down learning, and variable trajectory lengths that reduce training efficiency. To address these challenges, we propose Context-lite Multi-turn RL, a framework that incorporates:
(1) customizable agent memory mechanism, allowing the agent to flexibly include different lengths of historical interaction in each turn’s prompt based on task requirements, and
(2) Dual-discounting GAE, which decouples step-level and token-level credit assignment.
Experiments demonstrate that our method surpasses the zero-shot performance of state-of-the-art LLMs across four BabyAI scenarios, while also achieving greater efficiency and effectiveness than variants lacking either the memory mechanism or dual-discounting GAE.
Submission Number: 107
Loading