Tiresias: Low-Overhead Sample Based Scheduling with Task HoppingDownload PDFOpen Website

Published: 2016, Last Modified: 12 May 2023CLUSTER 2016Readers: Everyone
Abstract: Sample based distributed scheduling methods have been shown to be promising lower overhead alternatives to their centralized counterparts. These methods can make fast decisions based on information gathered from just a small number of worker nodes instead of the whole cluster. Most recent works in the field tend to adopt a combination of probe actions and worker-end queues in their design. However, as individual worker nodes are becoming increasingly powerful thanks to the rapid hardware evolution, we argue that one-node sampling is now a viable choice. Specifically, we show that it is now possible to achieve even lower scheduling latency by latency by abolishing probes and worker-end queues altogether. With this insight, we introduce Tiresias, a low overhead distributed scheduler based on one-node sampling and a novel task hopping mechanism. Comparing to Sparrow's approach, experiment on Google trace shows Tiresias could reduce 20% and 60% of Sparrow's 50th percentile and 90th percentile job runtime, respectively. In addition, our experiment also shows Tiresias is especially effective in reducing the delay of small jobs in non-highly loaded clusters.
0 Replies

Loading