PipePrune: Pipeline Parallel Based on Convolutional Layer Pruning for Distributed Deep Learning

Daojun Tan, Wenbin Jiang, Shang Qin, Hai Jin

Published: 2021, Last Modified: 12 Feb 2025HPCC/DSS/SmartCity/DependSys 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Benefitting from the combination of the idea of pipeline with model parallelism and data parallelism, pipeline parallelism improves the efficiency of distributed deep learning systems significantly. However, suffering from the bubbles and gaps caused by the imbalance of different stages in pipeline, it can not output ideal performance yet. To explore the potential of pipeline parallelism further, we propose a novel approach called PipePrune, which adds a convolutional layer pruning strategy to pipeline creatively to reduce the bubbles and gaps. For the convolutional layers with heavy overheads, some unimportant kernels are pruned by the measurement of the L1-norm. This approach makes the processing overheads of different pipeline stages more balanced. The experimental results show that, compared with state-of-the-art pipeline methods, PipePrune can improve the training speeds obviously (e.g. for ResNet50 on ImageNet, about 30% speed improvement is realized with only 1.1% loss of training accuracies).