Keywords: time series, data valuation
Abstract: Valuing temporal segments and individual time points within time series is crucial for tasks like data curation and robust learning, yet poses unique challenges. Existing methods often fail in this domain because they ignore the critical factors determining a segment's value, such as local patterns, temporal dependencies, and the broader distributional context. To address this, we introduce TimeLAVA, a learning-agnostic framework that quantifies data value by measuring the discrepancy between distributions of temporal segments. The core of this approach is a novel Selective Wavelet-based Wasserstein ($W_{SW}$) distance. This distance metric integrates multi-scale wavelet transforms to capture localized, intra-segment patterns. Additionally, it leverages unbalanced optimal transport to robustly handle non-stationarity and distributional shifts between the sets of segments. The intrinsic value of each segment is then efficiently derived via a sensitivity analysis of the $W_{SW}$ distance, and point-wise values are subsequently aggregated from these segment values. We provide theoretical guarantees linking our segment-based valuation to model-agnostic generalization and demonstrate its robustness. Empirical validation across diverse real-world datasets shows TimeLAVA significantly outperforming baselines at identifying influential and harmful temporal segments for applications like anomaly detection, data pruning, and temporal label noise detection.
Primary Area: learning on time series and dynamical systems
Submission Number: 6496
Loading