Mitigating Service Variability in MapReduce Clusters via Task Cloning: A Competitive AnalysisDownload PDFOpen Website

Published: 2017, Last Modified: 17 May 2023IEEE Trans. Parallel Distributed Syst. 2017Readers: Everyone
Abstract: Measurement traces from real-world production environment show that the execution time of tasks within a MapReduce job varies widely due to the variability in machine service capacity. This variability issue makes efficient job scheduling over large-scale MapReduce clusters extremely challenging. To tackle this problem, we adopt the task cloning approach to mitigate the effect of machine variability and design corresponding scheduling algorithms so as to minimize the overall job flowtime in different scenarios. For offline scheduling where all jobs arrive at the same time, we design an <inline-formula><tex-math notation="LaTeX">$O(1)$</tex-math></inline-formula> -competitive algorithm, which gives priorities to jobs with small effective workload. We then extend this offline algorithm to yield the so-called Smallest Remaining Effective Workload based <inline-formula><tex-math notation="LaTeX">$\beta$</tex-math></inline-formula> -fraction Sharing plus Cloning algorithm (SREW+C( <inline-formula><tex-math notation="LaTeX">$\beta$</tex-math></inline-formula> )) for the online case. We also show that SREW+C( <inline-formula><tex-math notation="LaTeX">$\beta$</tex-math></inline-formula> ) is <inline-formula> <tex-math notation="LaTeX">$(1+ 2\beta + \epsilon)$</tex-math></inline-formula> -speed <inline-formula> <tex-math notation="LaTeX">$O(\frac{1}{\beta \epsilon })$</tex-math></inline-formula> -competitive with respect to the sum of job flowtime within a cluster. We demonstrate via trace-driven simulations that SREW+C( <inline-formula> <tex-math notation="LaTeX">$\beta$</tex-math></inline-formula> ) can significantly reduce the overall job flowtime by cutting down the elapsed time of small jobs substantially. In particular, SREW+C( <inline-formula><tex-math notation="LaTeX">$\beta$</tex-math> </inline-formula> ) reduces the total job flowtime by 14, 10 and 11 percent respectively when comparing to Mantri, Dolly and Grass.
0 Replies

Loading