A New Approach for Scheduling Job with the Heterogeneity-Aware Resource in HPC Systems

Published: 01 Jan 2019, Last Modified: 11 Nov 2024HPCC/SmartCity/DSS 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scheduling in High Performance Computing (HPC) platforms is often a challenge with a huge range of researches and a noticeable phenomenon is a divergence between theory and practice. Especially, the heterogeneity of jobs today is becoming more popular, while a static scheduler is often prohibited by the runtime behavior as well as the multi-demand of programs running on a specific system. The hitherto conventional solution of scheduling jobs is only partially satisfying, because the program's behavior may differ significantly when comparing to the original scale. In this paper, we develop a dynamic scheduler based on characteristics and runtime behaviors of submitted jobs on a CPU/coprocessor-based cluster. Fundamentally, the behaviors of large-scale programs on an HPC system can be extracted from the history or log files, then draw a relational model between its characteristic and an objective function. This problem can be fitful in applying machine learning to find a scheduling function for job submission. We use a two-stage approach: first using simulation to simulate all situations of run jobs, in order to generate a dataset which then is used for training models with machine learning. This model enables to improve the scheduling performance and fit well the relationship between the job's characteristics and criteria in practice. More importantly, our experiments highlight the influence of the heterogeneity-aware resource over schedulers.
Loading