Constrained Belief Updating and Geometric Structures in Transformer Representations

Published: 23 Oct 2024, Last Modified: 24 Feb 2025NeurReps 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: computational mechanics, mechanistic interpretability, belief state geometry
TL;DR: Transformers trained on sequences from simple Hidden Markov Models form fractal intermediate representations related to, but distinct from, optimal belief state geometries, which can be explained by constrained belief updating equations.
Abstract:

How do transformers trained on next-token prediction represent their inputs? Our analysis reveals that in simple settings, transformers form intermediate representations with fractal structures distinct from, yet closely related to, the geometry of belief states of an optimal predictor. We find the algorithmic process by which these representations form and connect this mechanism to constrained belief updating equations, offering insight into the geometric meaning of these fractals. These findings bridge the gap between the model-agnostic theory of belief state geometry and the specific architectural constraints of transformers.

Submission Number: 31
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview