TL;DR: We develop gridded transformer neural processes (gridded TNPs), a new member of the TNP family that enables the use of efficient attention mechanisms to tackle problems involving large, unstructured data.
Abstract: Effective modelling of large-scale spatio-temporal datasets is essential for many domains, yet existing approaches often impose rigid constraints on the input data, such as requiring them to lie on fixed-resolution grids. With the rise of foundation models, the ability to process diverse, heterogeneous data structures is becoming increasingly important. Neural processes (NPs), particularly transformer neural processes (TNPs), offer a promising framework for such tasks, but struggle to scale to large spatio-temporal datasets due to the lack of an efficient attention mechanism. To address this, we introduce gridded pseudo-token TNPs which employ specialised encoders and decoders to handle unstructured data and utilise a processor comprising gridded pseudo-tokens with efficient attention mechanisms. Furthermore, we develop equivariant gridded TNPs for applications where exact or approximate translation equivariance is a useful inductive bias, improving accuracy and training efficiency. Our method consistently outperforms a range of strong baselines in various synthetic and real-world regression tasks involving large-scale data, while maintaining competitive computational efficiency. Experiments with weather data highlight the potential of gridded TNPs and serve as just one example of a domain where they can have a significant impact.
Lay Summary: Many real-world systems—such as weather, climate, scientific computing simulators—generate complex spatio-temporal data that are difficult to model. These data often come from heterogeneous sources (e.g., sensors, simulations) and are recorded at irregular times and locations, making them challenging to process with standard machine learning methods. Existing models that are able to deal with such large-scale data tend to require data to be structured on fixed grids, limiting their flexibility and generality.
In this work, we address this limitation by developing **gridded Transformer Neural Processes** (gridded TNPs)—a modelling framework that can flexibly handle unstructured spatio-temporal data. Our approach uses attention-based mechanisms to first encode irregular data onto a grid and then apply efficient transformer architectures for learning. We also introduce a variant that incorporates spatial symmetries, which improves both training efficiency and generalisation capabilities in settings where the data are (roughly) stationary.
We evaluate our method on synthetic and real-world datasets, including weather data, and show that it outperforms existing strong baselines while remaining computationally efficient. Our framework helps advance the development of data-driven, flexible, and scalable models for real-world spatio-temporal problems.
Link To Code: https://github.com/cambridge-mlg/gridded-tnp
Primary Area: Probabilistic Methods
Keywords: neural process, probabilistic machine learning, transformer, spatio-temporal data, spatio-temporal modelling, translation equivariance
Submission Number: 5244
Loading