Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting

Ege Onur Taga; Halil Alperen Gozeten; Kutay Tire; Rahul Dalvi; Reinhard Heckel; Samet Oymak

Filter, Augment, Forecast: Online Data Selection for Robust Time Series Forecasting

Ege Onur Taga, Halil Alperen Gozeten, Kutay Tire, Rahul Dalvi, Reinhard Heckel, Samet Oymak

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce Filter, Augment, Forecast (FAF), a model-agnostic online data curation strategy based on RHO-LOSS that boosts time series forecasting performance. We also provide theory, highlighting roles of the reference model, noise, and sample size.

Abstract: Data curation pipelines play a central role in training deep learning architectures, with their impact in time series forecasting still relatively underexplored. In this work, we propose Filter, Augment, Forecast (FAF): an online data curation strategy based on (1) data selection to filter out low-quality (e.g., noisy) examples and (2) augmentation of the remaining high-quality data. We use reference model-based filtering inspired by the reducible holdout loss selection (RHO-LOSS) from the language modeling literature. We identify limitations of RHO-LOSS under domain shifts common in time series and introduce the adaptive RHO method (AdaRho), which improves performance by updating the reference model during training. Using random matrix theory, we provide a statistical analysis that characterizes the role of the reference model, sample size, and noise statistics in data selection. FAF consistently improves forecasting accuracy across diverse architectures without modifying them, achieving state-of-the-art results.

Code Dataset Promise: No

Code Dataset Url: https://github.com/egetaga/FAF

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 480

Loading