Abstract: Matrix multiplication is a crucial operation in many data-intensive workloads. Given the large size of matrices in today's workloads, it is common to split the computation into tasks executed on different servers. As stragglers are common in distributed computing, various coding schemes have been proposed to mitigate stragglers, including some even leveraging the partially completed results from stragglers by splitting each task into subtasks. However, existing schemes have ignored the order of execution, making them unnecessarily complex for encoding and decoding. In this paper, we propose a series of constructions of straggler-leveraging coding schemes for matrix multiplication. We consider the execution order of subtasks and then construct the coding schemes based on the probability of an uncoded subtask being recovered by a coded subtask. As a result, our coding schemes can significantly save the encoding and decoding complexities while maintaining an arbitrarily controllable recoverability of incomplete uncoded subtasks.
External IDs:dblp:conf/icc/Zou024
Loading