DeepHA: Scaling Action Chains Elicits Deep Hierarchical Agents

Zihao Wang; Muyao Li; Kaichen He; Haowei Lin; Xiaojian Ma; Anji Liu; Yitao Liang

DeepHA: Scaling Action Chains Elicits Deep Hierarchical Agents

Zihao Wang, Muyao Li, Kaichen He, Haowei Lin, Xiaojian Ma, Anji Liu, Yitao Liang

18 Sept 2025 (modified: 24 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hierarchical Agent, Vision-Language Model, Minecraft

Abstract: Prevailing autonomous agents are often constrained by a single, predefined action space, which limits their generalization capabilities across diverse tasks and can introduce compounding errors through decoupled policy execution. To address these limitations, we introduce the Deep Hierarchical Agent (DeepHA), a unified architecture that operates across a mixture of heterogeneous action spaces, flexibly generating actions ranging from high-level semantic skills to low-level motor controls. We further propose a Chain-of-Action (CoA) reasoning framework, which enables the agent to use higher-level abstract actions as structured `thoughts' to guide the generation of more granular, subsequent actions. To manage the computational demands of this deep reasoning in long-horizon tasks, we develop a memory-efficient mechanism that dynamically compresses historical context and leverages Key-Value (KV) caching, reducing context length by approximately 75\% without sacrificing performance. We conduct extensive evaluations on a new, large-scale benchmark of over 800 diverse Minecraft tasks. Results show that DHA significantly outperforms prior methods, establishing a new state-of-the-art and demonstrating superior generalization, particularly in complex, multi-step planning tasks. Our work presents a novel, unified framework for building more capable and efficient autonomous agents.

Primary Area: reinforcement learning

Submission Number: 12632

Loading