Abstract: With the growth of the scale of deep neural learning models and the increasing amount of data, training Deep Neural Networks (DNNs) within the confines of GPU limitations has become a significant challenge. Existing techniques, including data compression, recomputation, and memory swapping, can facilitate training larger DNN models or increasing their batch sizes under constrained memory conditions. However, these methods often overlook the performance impact caused by the memory management strategies employed by the operating system or deep learning framework.This work is motivated by our observations that when the batch size of DNN model training exceeds a certain threshold, the PyTorch memory management triggers the OOM (Out of Memory) mechanism, therefore leading to substantial training performance losses. In light of this, we propose an adaptive memory pool for DNN training, which is mainly characterized by making a memory pool design based on runtime-tracking tensor memory allocation to optimize the global memory layout. The adaptive memory pool implements effective memory management based on the recognized patterns. Experimental results show that compared to native PyTorch framework large-batch DNN training, the adaptive memory pool can achieve up to a 1.24x performance acceleration.
External IDs:dblp:conf/pricai/LiLDFYJ24
Loading