Keywords: CPU offload, Memory Management, Dynamic Scheduler
Abstract: An obvious way to alleviate memory difficulties in GPU-based ML workloads is via CPU offload, where data are moved between GPU and CPU RAM. While CPU offload is useful, it can greatly slow down a computation due to the relatively slow transfer rate between CPU RAM and GPU RAM. To address this, overlapping memory transfer and compute is a necessity. In this paper, we present a unique approach to CPU offload in ML workloads, called DISCO (**D**ynam**I**c **S**cheduling for **C**pu **O**ffload). DISCO views an ML workload as a fine-grained dataflow graph. Operations in the graph are individual kernel calls to be run on a specific GPU, CPU-to-GPU transfers, GPU-to-CPU transfers, and GPU-to-GPU transfers. DISCO makes use of a work-conserving, dynamic scheduler to asynchronously execute the operations in the graph, whenever the underlying resource is available and the system can be sure that executing the operation cannot violate the correctness of the computation. In this way, DISCO ensures that all resources—GPUs, CPU-to-GPU bus—are fully utilized.
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 14629
Loading