Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction

Yi He; Yiming Yang; Xiaoyuan Cheng; Hai Wang; Xiao Xue; Boli Chen; Yukun Hu

Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction

Yi He, Yiming Yang, Xiaoyuan Cheng, Hai Wang, Xiao Xue, Boli Chen, Yukun Hu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: ChaosMeetsAttention is proposed as a scalable, ergodicity-preserving transformer framework for generating long-term, high-dimensional ergodic chaotic dynamics. We introduce new benchmarks for machine learning on chaos.

Abstract: Generating long-term trajectories of dissipative chaotic systems autoregressively is a highly challenging task. The inherent positive Lyapunov exponents amplify prediction errors over time. Many chaotic systems possess a crucial property — ergodicity on their attractors, which makes long-term prediction possible. State-of-the-art methods address ergodicity by preserving statistical properties using optimal transport techniques. However, these methods face scalability challenges due to the curse of dimensionality when matching distributions. To overcome this bottleneck, we propose a scalable transformer-based framework capable of stably generating long-term high-dimensional and high-resolution chaotic dynamics while preserving ergodicity. Our method is grounded in a physical perspective, revisiting the Von Neumann mean ergodic theorem to ensure the preservation of long-term statistics in the $\mathcal{L}^2$ space. We introduce novel modifications to the attention mechanism, making the transformer architecture well-suited for learning large-scale chaotic systems. Compared to operator-based and transformer-based methods, our model achieves better performances across five metrics, from short-term prediction accuracy to long-term statistics. In addition to our methodological contributions, we introduce new chaotic system benchmarks: a machine learning dataset of 140$k$ snapshots of turbulent channel flow and a processed high-dimensional Kolmogorov Flow dataset, along with various evaluation metrics for both short- and long-term performances. Both are well-suited for machine learning research on chaotic systems.

Lay Summary: Chaotic systems are widely known for the “butterfly effect,” where tiny changes can lead to vastly different outcomes. Predicting their long-term behavior is extremely hard, especially when the system has many moving parts (high dimensions) and fine details (high resolution). While chaos seems unpredictable, these systems often follow hidden patterns over time. Our research tackles the challenge of simulating these systems by combining physics insights with multi-stage accelerations for scaling. We designed a new transformer-based model—similar to those powering language and vision models— we innovate the attention mechanism and training methods, creating a faster, more robust way to generate long-term chaotic dynamics. We also introduce new benchmarks designed specifically for early-stage machine learning research on chaotic systems to the community. Our goal is to make large-scale chaos a little more predictable and accelerate the related machine learning research and the applications.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/Hy23333/ChaosMeetsAttention

Primary Area: Applications->Chemistry, Physics, and Earth Sciences

Keywords: Dynamical system, chaos, transformer, auto-regression

Submission Number: 2640

Loading