Fixing the Double Penalty in Data-Driven Weather Forecasting Through a Modified Spherical Harmonic Loss Function

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-ND 4.0
TL;DR: This paper proposes a simple modification to the mean squared error loss function that eliminates the problem of overly-smooth fine scales in data-driven weather forecasts.
Abstract: Recent advancements in data-driven weather forecasting models have delivered deterministic models that outperform the leading operational forecast systems based on traditional, physics-based models. However, these data-driven models are typically trained with a mean squared error loss function, which causes smoothing of fine scales through a ``double penalty'' effect. We develop a simple, parameter-free modification to this loss function that avoids this problem by separating the loss attributable to decorrelation from the loss attributable to spectral amplitude errors. Fine-tuning the GraphCast model with this new loss function results in sharp deterministic weather forecasts, an increase of the model's effective resolution from 1,250km to 160km, improvements to ensemble spread, and improvements to predictions of tropical cyclone strength and surface wind extremes.
Lay Summary: Data-driven weather forecasting is an emerging field that may soon overtake traditional numerical weather prediction for short and medium-term forecasts. However, deterministic data-driven models – models which are asked to provide the single best guess of future weather – tend to "hedge their bets" and under-predict fine-scale variation and extreme weather. This arises because of a "double penalty" during training with traditional loss functions. These loss functions punish both false positives and false negatives, so a model that correctly predicts a system like a hurricane but places it in the wrong location will be punished both for missing the "true" storm and for developing a "false" storm that doesn't match the ground truth. We address this issue by developing a modified loss function for model training. This loss function has separate terms that separate the reward from accurate prediction of the intensity of short and long-wavelength fluctuations (spectral amplitude) from its correlation with the ground truth (spectral coherence). For large scales that are very predictable, this encourages the same forecast behaviour as traditional error measures, but for small scales that are chaotic and unpredictable this encourages the model to still produce a realistic forecast. We fine-tune the ¼°, 13-level GraphCast model with this loss function, and the resulting model shows realistic variation to scales of 160 km (improved from 1,250 km), improves forecasts of tropical cyclone intensity, and modestly improves forecast variability in an ensemble setting.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/csubich/graphcast/tree/amse
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Loss functions, spherical harmonics, weather forecasting, tropical cyclones, fine tuning, double penalty
Submission Number: 11362
Loading