Keywords: Active imitation learning, Hierarchical reinforcement learning, LLM for planning
TL;DR: We combine LLM and RL for long-term planning in hierarchical reinforcement learning by taking advantage of an emergent symbolic abstraction by identifying LLMs' internal world model limitations
Abstract: Large Language Models (LLMs) exhibit their potential for interacting with reinforcement learning (RL) agents, the main challenge is to align the world model learned by the agent with a representation compatible with LLMs. We solve this problem by proposing an algorithm named SGIM-STAR that creates online a discrete world representation by reinforcement learning exploration, it is a hierarchical RL method that augments STAR with a partition-wise, learning-progress–driven switch between a learned Q-learning Navigator and an LLM Navigator. The agent builds a discrete reachability-based partition online and uses intrinsic motivation to query the LLM only when beneficial, defaulting to the learned navigator otherwise. This yields usage cost-aware: the learned navigator dominates early and the LLM is leveraged as the representation matures. On AntMaze, SGIM-STAR achieves the best and most stable success among STAR, LLM-only, and a non-partitioned adaptive variant, avoiding mid-training collapses while reducing LLM calls. The result demonstrates a practical fusion of LLMs with emerging symbolic world models for long-horizon tasks.
Submission Type: Research Paper (4-9 Pages)
Submission Number: 81
Loading