Demand-Aware Erasure Coding for Distributed Storage SystemsDownload PDFOpen Website

Published: 2021, Last Modified: 11 May 2023IEEE Trans. Cloud Comput. 2021Readers: Everyone
Abstract: Distributed storage systems provide cloud storage services by storing data on commodity storage servers. Conventionally, data are protected against failures of such commodity servers by replication. Erasure coding consumes less storage overhead than replication to tolerate the same number of failures and thus has been replacing replication in many distributed storage systems. However, with erasure coding, the overhead of reconstructing data from failures also increases significantly. Under the ever-changing workload where data accesses can be highly skewed, it is challenging to deploy erasure coding with appropriate values of parameters to achieve a well trade-off between storage overhead and reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data by their demand into multiple tiers that deploy erasure codes with different values of parameters. Zebra automatically determines the number of such tiers and dynamically assigns erasure codes with optimal values of parameters into corresponding tiers. With Zebra, a flexible trade-off between storage overhead and reconstruction overhead is achieved with multiple tiers. When demand changes, Zebra adjusts itself with a marginal amount of network transfer. We demonstrate that Zebra can work with two representative families of erasure codes in distributed storage systems, Reed-Solomon codes and local reconstruction codes.
0 Replies

Loading