Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The *turnstile* continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|\mathcal{U}|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|\mathcal{U}|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problems of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $O_{\eta}(T^{1/3})$ space and additive error, and a $(1+\eta)$-relative approximation for all $\eta \in (0,1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $\Omega(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $O_{\eta}(\sqrt{W})$ additive error, using $O_{\eta}(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by Jain et al. about designing low-memory mechanisms for this problem. We complement this results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\Omega(T^{1/3})$.
Lay Summary: In our modern era of big data, there is a growing need to analyze fast-changing data streams -- such as social media activity, online transactions, or sensor feeds -- while protecting individual privacy. This becomes particularly challenging when the dataset is extremely large and constantly evolving, with items being both added and removed continuously. Existing privacy-preserving methods typically require memory that scales with the total size of the data, making them impractical for large-scale, real-time analysis. Our research introduces the first algorithm that can accurately and differentially privately count the number of distinct items in such evolving data streams of length $T$ while using sublinear $O(T^{1/3})$ space. Our algorithm provides strong privacy and accuracy guarantees on the count that it produces. By addressing a key open question from prior work, our results pave the way for more space-efficient, privacy-aware data analysis where real-time insights are crucial.
Primary Area: Social Aspects->Privacy
Keywords: differential privacy, counting distinct elements, streaming data, turnstile, continual observation, continual release, sublinear space
Submission Number: 8028
Loading