Intelligent Resource Scheduling at Scale: A Machine Learning Perspective

Published: 01 Jan 2018, Last Modified: 13 Nov 2024SOSE 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview