GreedyGD: Enhanced Generalized Deduplication for Direct Analytics in IoT

Published: 01 Jan 2024, Last Modified: 13 May 2025IEEE Trans. Ind. Informatics 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The exponential growth of data generated by the Internet of Things presents significant challenges for data communication, storage, and analytics. Consequently, organizations often face high costs when attempting to leverage their own data. Novel techniques that holistically optimize data storage and analytics in IoT systems are therefore required. One promising approach is generalized deduplication (GD), which is a lossless compression technique that delivers high compression while also enabling low-cost random access directly on compressed data. In this article, we introduce GreedyGD, a novel GD data compression algorithm that offers reliable, efficient data analytics, along with more compression and faster runtime compared to previous GD compressors. Evaluating GreedyGD on 18 real-world datasets revealed excellent performance: a 11.2× speed-up, 1.6× more compression, and more accurate and reliable analytics while using 4× less data compared to previous GD compressors.
Loading