Privacy-preserving job scheduler for GPU sharing

Published: 01 May 2023, Last Modified: 03 Feb 2025IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing WorkshopsEveryoneRevisionsCC BY 4.0
Abstract: Machine learning (ML) training jobs are resource intensive. High infrastructure costs of computing clusters encourage multi-tenancy in GPU resources. This invites a scheduling problem in assigning multiple ML training jobs on a single GPU while minimizing task interference. Our paper introduces a clustering-based privacy-preserving job scheduler that minimizes task interference without accessing sensitive user data. We perform ML workload characterization, made available publicly [1], and do exploratory data analysis to cluster ML workloads. Consequently, we build a knowledge base of inter and intra-cluster task interference to minimize task interference.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview