(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Published: 03 Mar 2026, Last Modified: 03 Mar 2026ICLR 2026 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sparse attention, weather forecasting, uncertainty quantification
TL;DR: We propose using sparse attention with learned functional perturbations for processing high-resolution weather forecasting data.
Abstract: We introduce Mosaic, a probabilistic weather forecasting model that addresses two sources of spectral degradation in ML-based weather prediction: (1) training to predict the ensemble mean deterministically and (2) compressive encoding creating an information bottleneck. Mosaic combines learned functional perturbations for ensemble forecasting with block-sparse attention, a hardware-aligned formulation that shares keys and values across spatially adjacent queries, enabling each block to dynamically attend to the most relevant regions. By capturing arbitrarily long-range dependencies at linear cost, Mosaic processes high-resolution weather data without compression. Mosaic at 1.5° resolution matches or outperforms models trained on 0.25° data and achieves state-of-the-art results among 1.5° models on key upper-air variables, with individual ensemble members exhibiting near-perfect spectral alignment across all resolved frequencies.
Submission Number: 98
Loading