Tree-of-Options: Temporally Extended World Modeling, Planning, and Execution with Large Language Models

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM world model, MCTS, Minecraft
Abstract: With commonsense knowledge embedded, Large Language Models (LLMs) have been repurposed as world models that can be exploited by principled planning algorithms such as Monte Carlo Tree Search (MCTS). Prior works have been limited to exploiting LLMs for low-level world modeling, i.e., predicting immediate next world states and rewards upon primitive actions, which makes them unfit for long-horizon tasks where prediction errors compound quickly over time. This work develops an alternative framework where LLMs perform world modeling on temporally extended actions (options), to overcome their limitations in precise world modeling at small temporal scales. At this temporal abstraction level, LLMs will also be competent in suggesting reasonable options, enabling effective planning using MCTS. To execute the planned options with the primitive actions, we again turn to LLMs by prompting them to synthesize code implementing option-conditioned policies, which LLMs are known to excel at. Empirical results in Minecraft show that this approach substantially improves performance over prior LLM-based planners on long-horizon, compositional tasks for embodied agents.
Primary Area: reinforcement learning
Submission Number: 9873
Loading