(How) Do Language Models Track State?

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Transformer language models (LMs) exhibit behaviors—from storytelling to code generation—that seem to require tracking the unobserved state of an evolving world. How do they do this? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the “associative scan” construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, and then refines this with an associative scan. LMs that learn the former algorithm tend to generalize better and converge faster, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pre-trained or fine-tuned, can learn to implement efficient and interpretable state-tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled. Code and data are available at https://github.com/belindal/state-tracking
Lay Summary: Language models (like ChatGPT) often seem to "understand" what's happening in a story or a game by keeping track of how things change over time. But how do they actually do this? To study this, we trained models on puzzles that involve rearranging objects—like shuffling cups on a table—and asked them to figure out where everything ends up. These puzzles are simple but reflect the kind of memory and tracking needed in tasks like reasoning about code or playing games. We found that models learn one of two strategies. One method combines chunks of information in a tree-like way—layer by layer—not processing each step in order, but more efficiently in parallel. The other strategy starts by using a shortcut: it rules out many possible answers using a quick heuristic, then fills in the rest of the details using the first, chunking-based method. The first strategy tends to be more reliable, especially for longer problems. We also found that factors like model architecture and training setup influence which strategy a model ends up using. Understanding these strategies helps us better interpret how language models reason—and how to shape them to be more accurate and trustworthy.
Link To Code: https://github.com/belindal/state-tracking
Primary Area: Deep Learning->Large Language Models
Keywords: state tracking, simulation, interpretability
Flagged For Ethics Review: true
Submission Number: 12654
Loading