CodingSketch: A Hierarchical Sketch with Efficient Encoding and Recursive Decoding

Published: 2024, Last Modified: 05 Aug 2024ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sketch is a probabilistic data structure widely used in various fields due to its high accuracy under small memory. Designing hierarchical data structures for real-world datasets with high skewness is one of the main optimization directions of Sketch. However, there is still a big accuracy gap between the existing sketches and the optimum. To fill the gap, we propose a new sketch called Coding Sketch. For the first time, we used both hierarchical structure and nearly-lossless encoding-and-decoding to compress frequent items, which significantly improves the accuracy of frequent items. Besides, we propose flagless pruning to remove the additional flag bits in traditional hierarchical structure. Thus Coding Sketch can optimize the frequency estimation of both frequent and infrequent items. Our evaluation shows that our algorithm is 10 times more accurate than the state-of-the-art under the same memory cost. All related codes are open-sourced. 2 2 https://github.com/CodingSketch/Coding-Sketc
Loading