Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting

Ege Onur Taga; Halil Alperen Gozeten; Kutay Tire; Rahul Dalvi; Reinhard Heckel; Samet Oymak

Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting

Ege Onur Taga, Halil Alperen Gozeten, Kutay Tire, Rahul Dalvi, Reinhard Heckel, Samet Oymak

Published: 09 Jun 2025, Last Modified: 02 Jul 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: time series forecasting, data selection, data augmentation, regression analysis

TL;DR: This paper proposes FAF, a model-agnostic method building on RHO-LOSS for online batch selection to improve time series forecasting.

Abstract: While significant effort has been devoted to developing deep learning architectures for time series forecasting, the role of data in the training pipeline remains relatively overlooked. In this work, we propose Filter, Augment, Forecast (FAF): an online data curation strategy based on (1) data selection to filter out low-quality (e.g., noisy) examples and (2) augmentation of the remaining high-quality data. We use reference model-based filtering inspired by the reducible holdout loss selection (RHO-LOSS) from the language modeling literature. We identify limitations of RHO-LOSS under domain shifts common in time series and introduce the adaptive RHO method (AdaRho), which improves performance by updating the reference model during training. We provide a theoretical analysis using random matrix theory, highlighting the impact of reference models and noise on data selection. FAF improves forecasting accuracy across diverse architectures without altering them, achieving a 5.6% median MSE and 3.2% median MAE reduction on nine datasets.

Submission Number: 93

Loading