Universal Approximation of Mean-Field Models via Transformers

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We empirically and theoretically show that transformers can approximate infinite dimensional mean field equations
Abstract: This paper investigates the use of transformers to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. First, we empirically demonstrate that transformers are well-suited for approximating a variety of mean field models, including the Cucker-Smale model for flocking and milling, and the mean-field system for training two-layer neural networks. We validate our numerical experiments via mathematical theory. Specifically, we prove that if a finite-dimensional transformer effectively approximates the finite-dimensional vector field governing the particle system, then the $L_\infty$ distance between the \textit{expected transformer} and the infinite-dimensional mean-field vector field can be bounded by a function of the number of particles observed during training. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer.
Lay Summary: This work shows that transformer networks—the same models behind today’s language AIs—can learn to predict how large groups of identical “particles” (like birds in a flock, robots in a swarm, or neurons in a simple neural net) move together. Instead of tracking each particle, scientists often use “mean-field” equations describing the crowd’s overall behavior. Because transformers naturally handle many inputs without regard to order, they’re ideal for these indistinguishable-agent systems. The authors train transformers on two classic examples—the Cucker–Smale flocking model and a mean-field view of two-layer neural-network training—and find excellent agreement with simulated data. They then prove that if a transformer closely approximates the rules for a finite number of particles, one can mathematically bound its error when modeling infinitely many, giving a clear guarantee on how training size controls accuracy.
Link To Code: https://github.com/rsonthal/Mean-Field-Transformers
Primary Area: Deep Learning
Keywords: Mean Field Equations, Transformers, Universal Approximation, Collective Behavior
Submission Number: 12224
Loading