Keywords: forecasting, probabilistic forecasting, tranfer learning, cross-frequency, benchmarking
TL;DR: Current TSFM benchmarks are flawed; using 15 large scale, leak-free datasets we show statistical models still outperform FFMs, though synthetic pre-training narrows the gap.
Abstract: Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs). Although recent advances in CFTL have shown promise, current benchmarking practices fall short of accurately assessing CFTL FFM's performance. This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets. To address these limitations, we introduce a unified reimplementation of widely-adopted forecasting neural networks, we adapt them for the CFTL task; being careful to prevent test leakage, we pre-train only on proprietary and synthetic data; and we evaluate on 15 large, diverse forecast competition datasets. Our empirical analysis reveals that statistical models' accuracy is frequently underreported. Notably, we confirm that statistical models and their ensembles consistently outperform existing FFMs by more than 8.2% in sCRPS, and by more than 20% MASE, across datasets. We also find that synthetic dataset pre-training improves FFM's accuracy by 7% percent.
Submission Number: 14
Loading