Unleashing Interactive Agent Planning Ability in LLMs via Multi-Trajectory Reinforcement Learning

Unleashing Interactive Agent Planning Ability in LLMs via Multi-Trajectory Reinforcement Learning

ACL ARR 2026 January Submission6220 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Planning, reinforcement learning, interactive, embodied

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, yet they struggle with agent task planning in dynamic environments requiring continuous observation and sequential decision-making. Current methods generate static action sequences from pre-trained knowledge without learning from environmental feedback, limiting their effectiveness in partially observable settings. We present Interactive Planner-R1, a novel trajectory-level reinforcement learning framework that enables LLMs to develop interactive planning capabilities through autonomous environmental exploration. Our approach addresses three key challenges: (1) limited exploration diversity by introducing multi-trajectory autonomous exploration through parallel group rollouts, (2) sparse reward signals by developing a completion-driven reward architecture that promotes genuine environmental understanding, and (3) single-step optimization constraints by proposing Interactive Policy Optimization (IPO) that extends group-relative policy optimization for multi-step trajectory learning. Extensive experiments on ALFWorld and ScienceWorld demonstrate that Interactive Planner-R1 achieves substantial improvements over existing approaches, reaching 97.55\% completion rate on ALFWorld and 79.92\% on ScienceWorld, with strong generalization exhibiting only 3.33\% performance gap in unseen environments. Our work establishes a new paradigm for LLM-based interactive planning through trajectory-level policy learning.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Large Language Models (LLMs), Planning, reinforcement learning, interactive, embodied

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 6220

Loading