How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator

Published: 24 Jun 2024, Last Modified: 24 Jun 2024ICML 2024 MI Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mechanistic Interpretability, AI for Science, Physics, Transformers
TL;DR: We show that transformers use well-known numerical methods to predict trajectories of a simple harmonic oscillator by analyzing the intermediates encoded in the transformers' hidden states
Abstract: How do transformers model physics? We take a step in demystifying this larger puzzle by investigating the simple harmonic oscillator (SHO), $\ddot{x}+2\gamma \dot{x}+\omega_0^2x=0$, one of the most fundamental systems in physics. Our goal is to identify the methods transformers use to model the SHO, and to do so we hypothesize and evaluate possible methods by analyzing the encoding of these methods' intermediates. We develop two correlational and two causal criteria for the use of a method within the simple testbed of linear regression, where our method is $y = wx$ and our intermediate is $w$. Armed with these four criteria, we determine that transformers use known numerical methods to model trajectories of the simple harmonic oscillator, specifically the matrix exponential method. Our analysis framework can conveniently extend to high-dimensional linear systems and nonlinear systems, which we hope will help reveal the ``world model'' hidden in transformers.
Submission Number: 11