The Design of Fast Delta Encoding for Delta Compression Based Storage Systems

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ACM Trans. Storage 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview