Abstract: Data centers store a massive amount of data in a large number of servers built by commodity hardware. To maintain data integrity against server failures, erasure codes have been extensively deployed in modern data centers to provide a higher level of failure tolerance with less storage overhead than replication. Yet, compared to replication, disseminating erasure-coded data from a source server into multiple servers will also take significantly more time. In this paper, we design and implement Mist, a new mechanism for disseminating erasure-coded data efficiently to multiple receiving servers (receivers) in data centers. Mist speeds up the dissemination process by building an efficient topology among the receivers with heterogeneous performance, so that coded data can be received from other receivers in a pipelined fashion, rather than directly from the source. Mist flexibly supports a wide range of erasure codes, without imposing constraints to the range of system parameters, and can be extended for specific erasure codes with better performance by taking advantage of the corresponding erasure code. We have implemented Mist in Python, and our experimental results in Amazon EC2 have demonstrated that the dissemination time can be reduced by up to 96.3 percent with different kinds of erasure codes.
0 Replies
Loading