Zebra: Demand-aware erasure coding for distributed storage systemsDownload PDFOpen Website

Published: 2016, Last Modified: 11 May 2023IWQoS 2016Readers: Everyone
Abstract: Erasure coding has been increasingly replacing replication in distributed storage systems, thanks to its lower storage overhead with the same level of failure tolerance. However, with lower storage overhead, the reconstruction overhead of erasure codes can increase significantly as well. Under the ever-changing workload, in which the data access can be highly skewed, it is difficult to achieve a well trade-off between the storage overhead and the reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data into multiple tiers by their demand. Given the overall storage overhead and the number of failures to tolerate, Zebra determines the parameters of erasure coding in each tier by solving a geometric programming problem. Based on the demand of data, Zebra can dynamically assign data into the corresponding tiers to minimize the overall reconstruction overhead, and achieve a flexible tradeoff between the storage overhead and the reconstruction overhead in multiple tiers, such that hot data can enjoy less overhead of reconstruction and cold data can be stored with lower storage overhead. When demand changes, Zebra can adjust itself accordingly with a marginal amount of network transfer.
0 Replies

Loading