Informed Prefetching in I/O Bounded Distributed Deep Learning

Xiaojun Ruan, Haiquan Chen

2021 (modified: 22 Jan 2023)IPDPS Workshops 2021Readers: Everyone

Abstract: Deep learning research has been growing rapidly in the past decade for the significant performance improvement on GPUs. While the computing capability of current GPUs is tremendous, data pre-processing/loading becomes a potential bottleneck that incurs major training latency and adds overhead in both CPU and memory, especially when datasets are too large to fit in memory. When datasets are stripped on distributed file systems, access to a remote storage node may deteriorate I/O performance significantly due to network I/O latency in cloud. Moreover, some deep learning workloads may be assigned to remote GPU servers in Edge Computing which results in even higher network I/O latency. Therefore, it is desirable to provide efficient parallel and distributed prefetching solution which is able to reduce the I/O cost of data pre-processing before feeding the data into GPUs for training on distributed storage systems of Cloud or Edge. Although the current deep learning frameworks like PyTorch or TensorFlow offer multiprocessing data loading functionalities, their approaches come at the price of high computing resource usage and memory usage. In this paper, we presented a novel thread-level Informed Prefetching Data Loader framework, IPDL, in support of efficient data prefetching from remote storage nodes in distributed deep learning environments and possibly in Edge Computing. Compared to its counterparts in PyTorch, IPDL is able to provide accelerated I/O performance for data loading while consuming lower computing resource and memory space at the same time. Extensive experiments on both an individual server and a cluster computing system have shown the superiority of IPDL over the latest implementation of PyTorch.

0 Replies