Revisiting the Othello World Model Hypothesis

ACL ARR 2024 June Submission665 Authors

12 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: \citet{li2023emergent} used the Othello board game as a test case for the ability of GPT2 to induce world models, and were followed up by \citet{nanda-etal-2023-emergent}. We briefly discuss the original experiments, expanding them to include more language models with more detailed probing. Specifically, we analyze sequences of Othello board states and train the model to predict the next move based on previous moves. We evaluate six language models (GPT2, T5, Bart, Flan-T5, Mistral, and Llama-2) on the Othello task and conclude that these models not only learn to play Othello, but also induce the Othello board layout. We find that all models achieve up to 99\% accuracy in \textit{unsupervised} grounding and exhibit high similarity in the board features they learned. This provides much stronger evidence for the Othello World Model Hypothesis than previous works.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: language model probing, world models, othello game
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 665
Loading