Abstract: This paper studies the recovery from multiple node failures in distributed storage systems. We design a mutually cooperative recovery (MCR) mechanism for multiple node failures. Via a cut-based analysis of the information flow graph, we obtain a lower bound of maintenance bandwidth based on MCR. For MCR, we also propose a transmission scheme and design a linear network coding scheme based on (η, κ) strong-MDS code, which is a generalization of (η, κ) MDS code. We prove that the maintenance bandwidth based on our transmission and coding schemes matches the lower bound, so the lower bound is tight and the transmission scheme and coding scheme for MCR are optimal. We also give numerical comparisons of MCR with other redundancy recovery mechanisms in storage cost and maintenance bandwidth to show the advantage of MCR.
Loading