Parallelism-Aware Locally Repairable Code for Distributed Storage Systems

Jun Li, Baochun Li

Published: 2018, Last Modified: 11 May 2023ICDCS 2018Readers: Everyone

Abstract: Distributed storage systems store a substantial amount of data in a large number of servers built with commodity hardware. In order to protect data against server failures, erasure coding has been deployed in many distributed storage systems because of its low storage overhead. In particular, since disk I/O is, in many cases, a bottleneck in the distributed storage system, locally repairable codes, have been proposed that incur low volumes of disk I/O when reconstructing missing data after server failures. However, since original data can only be read from specific servers, existing designs of locally repairable codes suffer from limited data parallelism. Besides, if the performance of servers is heterogeneous, slow servers may become the bottleneck when accessing data in parallel. In this paper, we propose Galloper codes, a novel family of locally repairable codes, that achieve low disk I/O during reconstruction and meanwhile extend data parallelism from specific servers to all servers. Moreover, the amount of original data in each server can be arbitrarily determined based on the performance of corresponding servers. We have implemented a prototype of Galloper codes on Apache Hadoop, and our experimental results have shown that Galloper codes can reduce the completion time of MapReduce jobs by up to 42.9%, with a comparable performance as existing locally repairable codes, in terms of disk I/O overhead, as well as encoding and reconstruction overhead.

0 Replies