Keywords: LLM Agents, Hierarchical Reinforcement Learning
Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks.
However, existing reinforcement learning (RL) approaches for LLM agents typically rely on full interaction histories, resulting in high computational cost and limited scalability.
In this paper, we propose **STEP-HRL**, a hierarchical reinforcement learning (HRL) framework that enables step-level learning in LLM agents without relying on full interaction histories.
STEP-HRL structures tasks hierarchically, using completed subtasks to represent *global progress* of overall task. By introducing a *local progress* module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress.
Together, these components yield augmented step-level transitions for both high-level and low-level policies, enabling effective step-level policy optimization.
Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: AI / LLM Agents
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 9894
Loading