Abstract: Estimating the quantiles of a large dataset is a fundamental problem in both the streaming algorithms literature and the differential privacy literature. However, all existing private mechanisms for distribution-independent quantile computation require space at least linear in the input size $n$. In this work, we devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings. Our basic mechanism estimates any $\alpha$-approximate quantile of a length-$n$ stream over a data universe $\mathcal{X}$ with probability $1-\beta$ using $O\left( \frac{\log (|\mathcal{X}|/\beta) \log (\alpha \epsilon n)}{\alpha \epsilon} \right)$ space while satisfying $\epsilon$-differential privacy at a single time point. Our approach builds upon deterministic streaming algorithms for non-private quantile estimation instantiating the exponential mechanism using a utility function defined on sketch items, while (privately) sampling from intervals defined by the sketch. We also present another algorithm based on histograms that is especially well-suited to the multiple quantiles case. We implement our algorithms and experimentally evaluate them on synthetic and real-world datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Made several final edits, both to address reviewer comments and to clarify certain parts of the paper. The edits include:
1. Mentioning the space complexity of the GK sketch explicitly in the statement, as well as explaining how it is computed from the list of tuples.
2. Cleaned up the space complexity expression - the erstwhile $\log \alpha \epsilon n$ factors are a bit misleading and confusing, the explicit expression should be $\log \alpha \min (\epsilon,1) n$ so we might as well just write $\log n$.
3. Clarified the relation between the space complexity and the minimum stream length in the continual observation theorem statement, and rewrote parts of the proof for more clarity.
4. The constants in the sample complexity expression had accidentally been put in the denominator instead of the numerator, we tracked this down wherever it happened and fixed it.
5. Fixed alignment issues in Figures. Added funding info and acknowledgments.
Assigned Action Editor: ~Gautam_Kamath1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 866
Loading