Abstract: As the size of deep neural network models continues to increase, it places higher demands for memory capacity and allocation efficiency. NVIDIA GPUs are widely used in deep learning systems. CUDA has proposed many new techniques in recent years to make memory management more efficient. However, it is not easy to achieve correct and efficient memory reuse. To the best of our knowledge, we have not yet seen any literature that comprehensively, clearly, and unambiguously analyzes and discusses the key points about memory reuse in CUDA graph. This paper attempts to provide a systematic analysis of memory reuse in CUDA graph by performing an empirical study of related issues. We clarified a lot of unclear details and observed some key points in programming. We believe that our work would help programmers better unlock the potential of memory reuse, while avoiding inadvertent mistakes.
Loading