GrOUT: Transparent Scale-Out to Overcome UVM's Oversubscription Slowdowns

Published: 01 Jan 2024, Last Modified: 18 Jul 2025IPDPS (Workshops) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Hardware accelerators have always been difficult to approach. In recent years, we have experienced great efforts to simplify their programming paradigms, especially on GPUs. This led to the development of various domain-specific frameworks and microarchitectural features that facilitated some aspects of this multifaced problem. One such feature is the Unified Virtual Memory (UVM) oversubscription mechanism that allows the developer to handle datasets with a bigger memory footprint than the HW accelerators. Although promising, current UVM faces extreme overheads when running large workloads that reach an oversubscription factor (allocated vs. available memory) ampler than a per-workload threshold. In this work, we propose GrOUT, a language- and domain-agnostic framework that tackles the slowdowns brought by the UVM oversubscription mechanism. In particular, we highlight how a scale-out approach is a feasible solution to solve the slowdowns brought by UVM on workloads from various domains. Moreover, we design a framework capable of autonomously scaling out user-provided workloads, reaching a speedup of more than 24.42 × with minimal changes to the application logic.
Loading