Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Zican Hu; Wei Liu; Xiaoye Qu; Xiangyu Yue; Chunlin Chen; Zhi Wang; Yu Cheng

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We leverage offline RL with a parameter-efficient and generally applicable hierarchy to ground LLMs as efficient decision-making agents.

Abstract: While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework **GLIDER** (**G**rounding **L**anguage Models as Eff**I**cient **D**ecision-Making Agents via Offline Hi**E**rarchical **R**einforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and learning for long-horizon tasks. Furthermore, GLIDER facilitates fast online adaptation to non-stationary environments owing to the strong transferability of its task-agnostic low-level skills. Experiments on ScienceWorld and ALFWorld benchmarks show that GLIDER achieves consistent performance gains, along with enhanced generalization capabilities.

Lay Summary: Large language models (LLMs) have difficulty handling complex decision-making tasks, especially when feedback is limited. They often get lost in long-term planning and struggle to explore effectively, like a chess player who can't think multiple moves ahead. We developed GLIDER, a framework that breaks down complex tasks into smaller, manageable steps. Like a skilled manager delegating tasks, GLIDER uses a two-level system where high-level planning guides step-by-step execution. This approach helps LLMs tackle challenging tasks more efficiently and adapt to new situations better, showing significant improvements in virtual environments that test reasoning and problem-solving abilities.

Link To Code: https://github.com/NJU-RL/GLIDER

Primary Area: Deep Learning->Large Language Models

Keywords: Language agents, Hierarchical reinforcement learning, Offline reinforcement learning

Submission Number: 1231

Loading