Abstract: A data-parallel job is characterized as a directed acyclic graph (DAG) which usually consists of multiple computation stages and across-stage data transfers. However, classical DAG scheduling strategies like the critical path method ignore other stages off the main path and do not give specific consideration of data locality, transfer costs, etc. In practice, complicated DAGs include multiple paths which overlap with each other. The intersection of different paths in a DAG job forms an important synchronization. The synchronization significantly impacts the job completion time and however, no known scheduling methods are designed to speed up the completion time of the synchronization especially when complicated DAGs involve nested and hierarchical synchronizations. To the end, we propose a new abstraction, named branch, which is referred to a disjoint path in DAGs, and design a branch scheduling method to decrease the average job completion time of multiple data-parallel jobs. The branch scheduling method leverages the urgency of branches to speed up the synchronization of multiple parallel branches. We have implemented the BS method on Apache Spark and conducted prototype-based experiments. Compared with Spark FIFO and the shortest-job-first with the critical path methods, results show that the branch scheduling method achieves around 10-15% reduction in the average job completion time.
Loading