A Comprehensive Study of Present Data Deduplication

Published: 01 Jan 2021, Last Modified: 17 Sept 2024HPCC/DSS/SmartCity/DependSys 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the proliferation of the Internet of Things (IoT), various computing paradigms have been proposed and are developing rapidly these years, which led to the explosive increase of the data amount. However, the significant increase in data imposed a significant burden on contemporary server storage. Many approaches have been designed to tackle the challenge. Among them, deduplication is a quite effective data reduction technique, which received considerable attention from both academia and industry in the large-scale storage systems field. Data deduplication identifies redundant data at chunk level by using secure fingerprints, which not only removes replicated data, decreases the bandwidth, but also minimizes the storage usage and cost. This paper aims at describing the general framework of deduplication compression systems with duplicate and resemblance detection in detail. First, we use a flow chart to present the overall framework and process of the deduplication compression system. Then we summarize the existing algorithm applied to duplicate detection and resemblance detection. Additionally, we make a detailed evaluation of different resemblance detection algorithms. Finally, we make a conclusion of delta compressed prototype system, outline the open problems and shed a light on the future research directions in data deduplication systems.
Loading