Palantir: Hierarchical Similarity Detection for Post-Deduplication Delta Compression

Published: 01 Jan 2024, Last Modified: 30 Sept 2024ASPLOS (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deduplication compresses backup data by identifying and removing duplicate blocks. However, deduplication cannot detect when two blocks are very similar, which opens up opportunities for further data reduction using delta compression. Most existing works find similar blocks by characterizing each block by a set of features and matching similar blocks using coarse-grained super-features. If two blocks share a super-feature, delta compression only needs to store their delta for the new block.Existing delta compression techniques constrain their super-features to find only matching blocks that are likely to be very similar. Palantir introduces hierarchical super-features with different sensitivities to block similarities to find more candidates of similar blocks and increase overall backup compression including deduplication by 7.3% over N-Transform and Odess and 26.5% over Finesse. The overhead of Palantir for storing additional super-features is overcome by exploiting temporal localities of backup streams, and the throughput penalty is within 7.7%. Palantir also introduces a false positive filter to discard matching blocks that are counterproductive for the overall data reduction.
Loading