Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

ACL ARR 2026 January Submission9894 Authors

06 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agents, Hierarchical Reinforcement Learning

Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing reinforcement learning (RL) approaches for LLM agents typically rely on full interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose **STEP-HRL**, a hierarchical reinforcement learning (HRL) framework that enables step-level learning in LLM agents without relying on full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent *global progress* of overall task. By introducing a *local progress* module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies, enabling effective step-level policy optimization. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: AI / LLM Agents

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 9894

Loading