PiqSketch: An Efficient Sketching Algorithm for Per-Key Tail Quantile Estimation in Large-Scale Data Streams
Abstract: Approximate stream processing has proven to be an effective solution to various measurement tasks in large-scale data streams. Among them, per-key tail quantile estimation is an important yet not deeply explored area that is of great use in multiple scenarios. In this paper, we propose a sketching algorithm named PiqSketch to estimate per-key tail quantiles accurately and efficiently. PiqSketch employs a multi-level bucket-based sketch structure and hierarchical algorithms, separately processing the information of keys and values. By fully utilizing the two-dimensional skewness of key-value data streams, PiqSketch achieves impressive performance. To further comprehensively enhance PiqSketch’s performance, we employ four useful optimizations: Hierarchical Cell Shifting and Hierarchical Slot Shifting dynamically allocate adequate memory resources to objects of different scales, while High Ranking Merging aggregates redundant information to save memory space. SIMD Acceleration improves PiqSketch’s efficiency by applying data parallelism. Extensive experiment results based on real-world datasets show that PiqSketch significantly outperforms the state-of-the-art solutions with a 3.87 times higher accuracy, a 2.27 times higher insertion throughput, and a 9.15 times higher query throughput, on average.
External IDs:dblp:conf/adma/ZhouGSHDW24
Loading