Differentially Private Clustering in Data Streams

Published: 18 Jun 2023, Last Modified: 30 Jun 2023TAGML2023 PosterEveryoneRevisions
Keywords: clustering, differential privacy, streaming, continual release, continual observation
TL;DR: We give the first differentiall private algorithms for k-means and k-median clustering in the continual release model and one-shot setting
Abstract: Clustering problems (such as k-means and k-median) are fundamental unsupervised machine learning primitives. Recently, these problems have been subject to large interest in the privacy literature. All prior work on private clustering, however, has been devoted to the \emph{offline} case where the entire dataset is known in advance. In this work, we focus on the more challenging private data stream setting where the aim is to design memory-efficient algorithms that process a large stream \emph{incrementally} as points arrive in a private way. Our main contribution is to provide the first differentially private algorithms for $k$-means and $k$-median clustering in data streams. In particular, our algorithms are the first to guarantee differential privacy both in the continual release and in the one-shot setting while achieving space sublinear in the stream size. We complement our theoretical results with an empirical analysis of our algorithms on real data.
Supplementary Materials: pdf
Type Of Submission: Extended Abstract (4 pages, non-archival)
Submission Number: 21
Loading