An Empirical Study of Memory Pool Based Allocation and Reuse in CUDA Graph

Ruyi Qian; Mengjuan Gao; Qinwen Shi; Yuanchao Xu

An Empirical Study of Memory Pool Based Allocation and Reuse in CUDA Graph

Ruyi Qian, Mengjuan Gao, Qinwen Shi, Yuanchao Xu

Published: 01 Jan 2023, Last Modified: 13 May 2025ICA3PP (5) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As the size of deep neural network models continues to increase, it places higher demands for memory capacity and allocation efficiency. NVIDIA GPUs are widely used in deep learning systems. CUDA has proposed many new techniques in recent years to make memory management more efficient. However, it is not easy to achieve correct and efficient memory reuse. To the best of our knowledge, we have not yet seen any literature that comprehensively, clearly, and unambiguously analyzes and discusses the key points about memory reuse in CUDA graph. This paper attempts to provide a systematic analysis of memory reuse in CUDA graph by performing an empirical study of related issues. We clarified a lot of unclear details and observed some key points in programming. We believe that our work would help programmers better unlock the potential of memory reuse, while avoiding inadvertent mistakes.

Loading