Near-Optimal Relative Error Streaming Quantile Estimation via Elastic Compactors

Published: 07 Jan 2025, Last Modified: 25 Jan 2026ACM-SIAM Symposium on Discrete Algorithms (SODA) 2025EveryoneRevisionsCC BY 4.0
Abstract: Computing the approximate quantiles or ranks of a stream is a fundamental task in data monitoring. Given a stream of elements $x_1, x_2, \dots, x_n$ and a query $x$, a relative-error quantile estimation algorithm can estimate the rank of $x$ with respect to the stream, up to a multiplicative $\pm \epsilon \cdot \textrm{rank}(x)$ error. Notably, this requires the sketch to obtain more precise estimates for the ranks of elements on the tails of the distribution, as compared to the additive $\pm \epsilon n$ error regime. This is particularly favorable for some practical applications, such as anomaly detection. Previously, the best known algorithms for relative error achieved space $\tilde O(\epsilon^{-1} \log^{1.5}(\epsilon n))$ (Cormode, Karnin, Liberty, Thaler, Vesel{\`y}, 2021) and $\tilde O(\epsilon^{-2} \log(\epsilon n))$ (Zhang, Lin, Xu, Korn, Wang, 2006). In this work, we present a nearly-optimal streaming algorithm for the relative-error quantile estimation problem using $\tilde O(\epsilon^{-1} \log(\epsilon n))$ space, which almost matches the trivial $\Omega(\epsilon^{-1} \log (\epsilon n))$ space lower bound. To surpass the $\Omega(\epsilon^{-1} \log^{1.5}(\epsilon n))$ barrier of the previous approach, our algorithm crucially relies on a new data structure, called an \emph{elastic compactor}, which can be dynamically resized over the course of the stream. Interestingly, we design a space allocation scheme which adaptively allocates space to each compactor based on the ``hardness'' of the input stream. This approach allows us to avoid using the maximal space \emph{simultaneously} for every compactor and facilitates the improvement in the total space complexity. Along the way, we also propose and study a new problem called the Top Quantiles Problem, which only requires the sketch to provide estimates for the ranks of elements in a fixed-length tail of the distribution. This problem serves as an important subproblem in our algorithm, though it is also an interesting problem of its own right.
Loading