Sequential addition of coded sub-tasks for straggler mitigationDownload PDFOpen Website

Published: 2020, Last Modified: 11 Sept 2023INFOCOM 2020Readers: Everyone
Abstract: Straggler mitigation can be achieved by redundant computation. In MDS redundancy method, a task is divided into k sub-tasks which are encoded to n coded sub-tasks, such that a task is completed if any k coded sub-tasks are completed. Two important metrics of interest are task completion time, and server utilization cost which is the aggregate completed work by all servers in this duration. We consider a proactive straggler mitigation strategy where n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> out of n coded sub-tasks are started at time 0 while the remaining n - n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> coded sub-tasks are launched when ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> ≤ min(n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> , k) of the initial ones finish. The coded sub-tasks are halted when k of them finish. For this flexible forking strategy with multiple parameters, we analyze the mean of two performance metrics for the proposed forking strategy when the random service completion time at each server is independent and distributed identically to a shifted exponential. Our analysis demonstrates that the regime of n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> <; k leads to higher mean service completion time and no change in mean server utilization cost as compared to no forking (n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> = n), and is thus not a regime of interest. For n <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> ≥ k, we find that there is a tradeoff between the two performance metrics and leads to decrease in mean server utilization cost at the expense of mean service completion time and an efficient choice of the parameters is helpful.
0 Replies

Loading