Transformers Represent Causal Abstractions

Published: 23 Sept 2025, Last Modified: 27 Nov 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Abstraction, emergence coarse-graining, lumpability, Hidden Markov models, transformers, representation learning
TL;DR: Preliminary results suggest phase changes regarding preferred causal abstractions in transformers
Abstract: Agents often interact with environments too complex to model in microscopic detail. Abstractions offer a way to form useful models anyway. When and how do such abstrac- tions arise? Drawing on recent work on macro-level structure (“emergence”) in complex systems, we hypothesize that agents interacting with such systems naturally learn abstrac- tions aligned with the macro-level. To investigate, we introduce a parameterized hidden Markov model (HMM) with a tunable degree of macro-structure. We then train a trans- former on sequences of observables generated by the HMM and track the evolution of abstractions represented in its residual stream. As the macro-structure parameter is var- ied, we observe systematic changes in internal representations and dynamics. These results provide preliminary evidence that exposure to macro-structured processes drives the emer- gence of abstractions in deep models.
Submission Number: 163
Loading