Lazy repair with temporary redundancy(LRTR): reducing repair network traffic in erasure-coded storage
Abstract: Erasure coding has gained popularity in today's storage systems as a low-storage overhead and high-reliability fault-tolerant method. However, it is hampered by the high repair costs. The temporary failures in storage systems amplify this drawback resulting in a lot of unnecessary repair traffic. It leads to a dilemma that traditional repair schemes can not optimize repair traffic and reliability at the same time. To solve this problem, we propose LRTR, a lazy repair scheme with temporary redundancy. LRTR ensures reliability by performing temporary redundancy with low network overhead on the surviving chunks of risky stripes (stripes with failed chunks) instead of repairing the failed chunks with costly network costs. The repair time of the failed chunks is then delayed at high reliability to filter out unnecessary repairs, thus reducing repair network traffic. Extensive simulation results show that LRTR improves data reliability by 13.6 times and reduces repair network traffic by 43.3% compared with the traditional repair scheme.
Loading