AutoPipe: A Fast Pipeline Parallelism Approach with Balanced Partitioning and Micro-batch Slicing

Weijie Liu, Zhiquan Lai, Shengwei Li, Yabo Duan, Keshi Ge, Dongsheng Li

2022 (modified: 05 Nov 2022)CLUSTER 2022Readers: Everyone

Abstract: Recently, pipeline parallelism has been widely used in training large DNN models. However, there are still two main challenges for efficient pipeline parallelism: i) a balanced model partition is crucial for pipeline efficiency, whereas prior works lack a sound solution to generate a balanced partition automatically. ii) the startup overhead is inevitable and especially significant for deep pipelines, which is an essential source of pipeline bubbles and severely affects pipeline scalability. We propose AutoPipe to solve these two problems, which contains i) a planner for automatically and quickly generating a balanced pipeline partition scheme with a fine-grained partitioner. This partitioner groups DNN in the sub-layer granularity and finds the balanced scheme with a heuristic search algorithm; and ii) a micro-batch slicer that reduces pipeline startup overhead according to the planner results by splitting the micro-batch evenly. This slicer automatically solves an appropriate number of micro-batches to split. The experimental results show that AutoPipe can accelerate training by up to 1.30x over the state-of-the-art distributed training framework Megatron-LM, with a 50% reduction in startup overhead and an order-of-magnitude reduction in pipeline planning time. Furthermore, AutoPipe Planner improves the partition balance by 2.73x-12.7x compared to DAPPLE Planner and Piper.

0 Replies