Lightweight Protocols for Distributed Private Quantile Estimation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: By using a private noisy binary search, we achieve quantile estimates under local differential privacy using a number of users that grows with the log of the domain size whereas prior non-adaptive methods use a larger (log-squared) number of users.
Abstract: Distributed data analysis is a large and growing field driven by a massive proliferation of user devices, and by privacy concerns surrounding the centralised storage of data. We consider two \emph{adaptive} algorithms for estimating one quantile (e.g.~the median) when each user holds a single data point lying in a domain $[B]$ that can be queried once through a private mechanism; one under local differential privacy (LDP) and another for shuffle differential privacy (shuffle-DP). In the adaptive setting we present an $\varepsilon$-LDP algorithm which can estimate any quantile within error $\alpha$ only requiring $O(\frac{\log B}{\varepsilon^2\alpha^2})$ users, and an $(\varepsilon,\delta)$-shuffle DP algorithm requiring only $\widetilde{O}((\frac{1}{\varepsilon^2}+\frac{1}{\alpha^2})\log B)$ users. Prior (nonadaptive) algorithms require more users by several logarithmic factors in $B$. We further provide a matching lower bound for adaptive protocols, showing that our LDP algorithm is optimal in the low-$\varepsilon$ regime. Additionally, we establish lower bounds against non-adaptive protocols which paired with our understanding of the adaptive case, proves a fundamental separation between these models.
Lay Summary: Finding the middle or “typical” value in a dataset — like the median income — is a common goal in statistical data analysis. But when the data is private, for example users’ personal info on their phones, it becomes tricky: How do we estimate this statistic without revealing any single users personal data. This paper focuses on how to estimate quantiles (such as the median) when data stays on individual devices and is shared in a privacy-preserving way using local differential privacy (LDP). LDP is a method for protecting users by adding noise to their answers before they’re shared. For example, one can imagine asking users yes/no questions about their data, but instead of always answering truthfully, they only tell you the correct answer with probability 60% and otherwise lie with probability 40%. Based on any answer, it is impossible to make a qualified guess about the user's data, but aggregating many such answers from different users one can start to understand statistics of the data set. However, before this work, existing methods have required many such noisy answers from users in order to get accurate results. We propose a smarter method that asks better questions in multiple rounds, adapting the questions based on previous answers — like a game of “warmer/colder” that quickly homes in on the right value. This approach drastically reduces the number of users needed, while still respecting privacy. In fact, we show that our method is optimal: Under the constraints of LDP, no protocol can do better in a strong mathematical sense. We further combine this with a shuffling technique, which mixes up users’ responses, a process which boosts user privacy even further.
Link To Code: https://github.com/NynsenFaber/Quantile_estimation_with_adaptive_LDP
Primary Area: Social Aspects->Privacy
Keywords: local differential privacy, adaptive algorithms, quantiles, shuffle model
Submission Number: 10530
Loading