Differentially Private Clustering in Data Streams

Published: 03 Feb 2026, Last Modified: 06 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We give the first sublinear-space differentially private streaming algorithms for $k$-means and $k$-median in the continual release model.
Abstract: Clustering tasks such as k-means and k-median are central in unsupervised learning, and streaming algorithms for these tasks are widely used to handle large or evolving datasets. When applied in sensitive domains, however, such algorithms must also provide rigorous privacy guarantees. In this work, we provide the first differentially private (DP) algorithms for k-means and k-median clustering of d-dimensional Euclidean data points over a stream of length at most T, using space that is sublinear in T, in the continual release setting where the algorithm is required to output a clustering at every timestep. We achieve (1) an O(1)-multiplicative approximation with ~O(k^{1.5} poly(d, log T)) space and poly(k,d,log T) additive error, or (2) a (1+gamma)-multiplicative approximation with ~O_gamma(poly(k, 2^{O_gamma(d)}, log T)) space for any gamma>0, with additive error poly(k, 2^{O_gamma(d)}, log T). Our main technical contribution is a DP clustering framework for data streams that only requires an offline DP coreset or clustering algorithm as a blackbox.
Submission Number: 604
Loading