Abstract: Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate MDS (maximum distance separable) codes into the framework; this can achieve resilience against an optimal number of stragglers. However, these codes assign dense linear combinations of submatrices to the workers which increase the number of non-zero entries in the encoded matrices, and adversely affect the worker computation time. In this work, we develop a straggler-optimal distributed matrix computation approach where the assigned encoded submatrices are linear combinations of a small number of submatrices so that it is well suited for sparse input matrices. Numerical experiments conducted in Amazon Web Services (AWS) demonstrate up to 30% reduction in worker computation time and 100 times faster encoding compared to several recent methods.
Loading