GOGH: Correlation-Guided Orchestration of GPUs in Heterogeneous Clusters

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Resource allocation, GPU cluster, heterogeneity, deep learning, integer linear programming
TL;DR: We propose a method for adaptive management of machine learning jobs, which uses two neural networks to cope with hardware utilization uncertainties.
Abstract: In heterogeneous clusters with varying capabilities and energy efficiency, sustainable use of mixed-generation resources is essential. We propose a method for adaptive management of machine learning jobs, aiming to minimize energy while meeting performance targets which uses two neural networks to cope with hardware utilization uncertainties. We demonstrate the efficacy of this adaptive process via the Gavel benchmark [1].
Serve As Reviewer: ~Mahdi_Dolati1
Submission Number: 40
Loading