Keywords: Large Language Models, World Models, World Representation, Probing, Reinforcement Learning, State Abstraction
TL;DR: We introduce a new framework for probing world abstraction within LLM-built representations, and our experiments with a text-based planning task demonstrate LLMs prefer maintaining goal-oriented abstractions during decoding.
Abstract: How do large language models (LLMs) encode the state of the world, including the status of entities and their relations, as described by a text? While existing work directly probes for a complete state of the world, our research explores whether and how LLMs abstract this world state in their internal representations. We propose a new framework for probing for world representations through the lens of state abstraction theory from reinforcement learning, which emphasizes different levels of abstraction, distinguishing between general abstractions that facilitate predicting future states and goal-oriented abstractions that guide the subsequent actions to accomplish tasks. To instantiate this framework, we design a text-based planning task, where an LLM acts as an agent in an environment and interacts with objects in containers to achieve a specified goal state. Our experiments reveal that fine-tuning as well as advanced pre-training strengthens LLM-built representations' tendency of maintaining goal-oriented abstractions during decoding, prioritizing task completion over recovery of the world's state and dynamics.
Primary Area: Natural language processing
Submission Number: 15815
Loading