Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce Filter, Augment, Forecast (FAF), a model-agnostic online data curation strategy based on RHO-LOSS that boosts time series forecasting performance. We also provide theory, highlighting roles of the reference model, noise, and sample size.
Abstract: Data curation pipelines play a central role in training deep learning architectures, with their impact in time series forecasting still relatively underexplored. In this work, we propose Filter, Augment, Forecast (FAF): an online data curation strategy based on (1) data selection to filter out low-quality (e.g., noisy) examples and (2) augmentation of the remaining high-quality data. We use reference model-based filtering inspired by the reducible holdout loss selection (RHO-LOSS) from the language modeling literature. We identify limitations of RHO-LOSS under domain shifts common in time series and introduce the adaptive RHO method (AdaRho), which improves performance by updating the reference model during training. Using random matrix theory, we further provide a statistical analysis that characterizes the role of the reference model, sample size, and noise statistics in data selection. FAF consistently improves forecasting accuracy across diverse architectures without modifying them, achieving state-of-the-art results. Specifically, applying FAF to eight state-of-the-art models yields a 6.55% mean reduction in MSE and a 3.79% mean reduction in MAE, averaged across nine benchmark datasets.
Submission Number: 480
Loading