FullRepair: Towards Optimal Repair Pipelining in Erasure-Coded Clustered Storage Systems

Published: 01 Jan 2023, Last Modified: 16 May 2025CLUSTER 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Clustered storage systems often deploy erasure coding that encodes data into coded chunks and distributes them across nodes to tolerate node failures. It is a storage-efficient redundancy scheme but incurs high repair penalty; thus some state-of-the-arts aim to pipeline the above repair process to improve the repair performance. However, we observe that all existing repair pipelining methods only use a single pipeline, making network bandwidth resources of storage nodes underutilized.In this paper, we propose FullRepair, a new repair pipelining mechanism based on multiple pipelines with the aim of fully exploiting all available bandwidth resources during repair. We construct four constraints to model the repair pipelining problem such that we can obtain the optimal pipelined repair throughput under full bandwidth utilization. We design a multi-pipeline scheduling scheme for FullRepair so as to achieve the above optimality. Experiments on the Amazon EC2 show that compared with the state-of-the-art repair pipelining methods RP and PivotRepair, FullRepair reduces the repair time of single chunk by up to 45.40% and 33.19%, respectively.
Loading