Center-Based Approximation of a Drifting Distribution

Published: 18 Dec 2024, Last Modified: 14 Feb 2025ALT 2025EveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present a novel technique for computing a center-based approximation of a drifting distribution. Given $k \geq 1$ and a sequence of data, whose distribution is changing over time, the goal is to compute, at each step, the best $k$ centers representation of the current distribution, despite possibly having only a single sample from the most recent distribution. In data mining, this is traditionally attempted through the sliding-window mechanism, where the analysis is performed on the most recent fixed-size segment of the data. The problems with this approach are twofold: (1) setting the correct window size is challenging; and (2) a fixed window size cannot effectively track changes in the distribution happening at variable speed. In this paper, we propose a new methodology that dynamically adjusts the window size based on the recent drift of the data. The challenge is that it is not possible to explicitly estimate the drift, as we may have only a single data point from each distribution. Our main contribution lies in providing a rigorous mathematical analysis, establishing both an upper bound via a dynamic window size algorithm, and a lower bound that shows the tightness of our approach.
PDF: pdf
Submission Number: 118
Loading