What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Keyon Vafa; Peter G. Chang; Ashesh Rambachan; Sendhil Mullainathan

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Keyon Vafa, Peter G. Chang, Ashesh Rambachan, Sendhil Mullainathan

Published: 01 May 2025, Last Modified: 14 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We develop inductive bias probes to test whether foundation models have inductive biases toward specific world models.

Abstract: Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

Lay Summary: Scientists have often made discoveries by first making predictions about the world around them. For example, astronomers like Kepler noticed geometric patterns that could be used to pinpoint the future locations of planets in the night sky. Newton would later expand on these results to develop Newtonian mechanics, fundamental laws that could not only predict the movement of planets but also explain physical properties across the universe. Similarly, modern AI systems known as foundation models can predict the next item in a sequence -- whether that sequence is words in a sentence or positions of planets. But does that predictive skill mean the model truly understands the deeper rules that govern the world? This paper develops a test for assessing these "world models" of foundation models. We find that most often, foundation models that excel at making predictions have not uncovered the laws that govern them. We illustrate this by training a foundation model on planetary orbits, and show that it uncovers a warped understanding of the world with no resemblance to Newtonian mechanics. So far, foundation models are not capable of making the transition humans are capable of making: from good predictions to accurate world models.

Primary Area: Deep Learning->Foundation Models

Keywords: world models, foundation models, inductive bias, large language models

Link To Code: https://github.com/keyonvafa/inductive-bias-probes

Submission Number: 13728

Loading