SuperSketch: A Multi-Dimensional Reversible Data Structure for Super Host Identification

Published: 01 Jan 2022, Last Modified: 03 Aug 2024IEEE Trans. Dependable Secur. Comput. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Facing big network traffic data, effective data compression becomes crucially important and urgently needed for estimating host cardinalities and identifying super hosts. However, the current literature confronts several challenges: incapability of simultaneously measuring various types of host cardinalities and inability to efficiently reconstruct super host addresses. To address these challenges, in this article, we propose a novel sketch data structure, named SuperSketch, to simultaneously measure multiple types of host cardinalities with the purpose of efficiently identifying super hosts. SuperSketch has two significant characteristics: multi-dimensionality and reversibility. The multi-dimensionality makes SuperSketch capable of simultaneously measuring Source Cardinality, Destination Cardinality, and Destination Port Cardinality. The reversibility allows SuperSketch to accurately and quickly reconstruct the original addresses of super hosts once they are identified. We conduct both theoretical analysis and performance evaluation based on real-world network traffic. Experimental results show that SuperSketch achieves outstanding performance for multi-cardinality measurement, super host identification, and host address reconstruction.
Loading