Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

Published: 23 Oct 2023, Last Modified: 28 Nov 2023SoLaR PosterEveryoneRevisionsBibTeX
Keywords: Transformer, Linear Representation, World Model, Othello, Causal Intervention
TL;DR: The paper shows that a transformer trained on Othello develops a linear internal representation of the board state that it uses causally to make move predictions, especially in the middle layers.
Abstract: Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity.
Submission Number: 40
Loading