Adaptive Estimation and Learning under Temporal Distribution Shift

Dheeraj Baby; Yifei Tang; Hieu Duy Nguyen; Yu-Xiang Wang; Rohit Pyati

Adaptive Estimation and Learning under Temporal Distribution Shift

Dheeraj Baby, Yifei Tang, Hieu Duy Nguyen, Yu-Xiang Wang, Rohit Pyati

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose to use wavelet-denoising to obtain point-wise estimation error guarantees for estimating trend in time-series.

Abstract: In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length $n$, which is a noisy realization of a time-varying ground-truth sequence. Our focus is to develop methods to estimate the groundtruth at the final time-step while providing sharp point-wise estimation error rates. We show that, *without prior knowledge* on the level of temporal shift, a wavelet soft-thresholding estimator provides an *optimal* estimation error bound for the groundtruth. Our proposed estimation method generalizes existing researches (Mazetto and Upfal, 2023) by establishing a connection between the sequence's non-stationarity level and the sparsity in the wavelet-transformed domain. Our theoretical findings are validated by numerical experiments. Additionally, we applied the estimator to derive sparsity-aware excess risk bounds for binary classification under distribution shift and to develop computationally efficient training objectives. As a final contribution, we draw parallels between our results and the classical signal processing problem of total-variation denoising (Mammen and van de Geer 1997; Tibshirani 2014 ), uncovering *novel optimal* algorithms for such task.

Lay Summary: The paper proposes a versatile way of estimating quantities under temporal distribution shift. A simple example that illustrates the problem setup is the task of estimating trends from a univariate and noisy time series observations. We show that wavelet based denoising leads to optimal pointwise estimation error guarantees. The estimation guarantee is not just worst case optimal, but also adapts naturally to the hardness of the problem. i.e more stationary the trend is, sharper the error rates are. Further, these properties are attained with no hand tuning and no prior knowledge or assumptions on how fast the trend evolves. The methods are also applicable to training and evaluating models especially in production pipelines, where the collected training dataset is known to have shifted data distribution over time.

Primary Area: General Machine Learning

Keywords: Temporal Distribution Shift; Non-Stationary data; Total Variation Denoising

Submission Number: 8504

Loading