PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

Published: 01 Jan 2024, Last Modified: 02 Oct 2024Parallel Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Existing GPU cluster scheduling strategies trade-off between fairness and efficiency.•Aggregate job statistics are predictable for large-scale production GPU clusters.•Our scheduler PPS achieves both fairness and efficiency by predicting resource status.•PPS is general, transparent, and secure because it is black-box and non-preemptive.•Tailored procedures are designed to predict the idle resources and resource demands.
Loading