MixPipe: Efficient Bidirectional Pipeline Parallelism for Training Large-Scale Models

Published: 2023, Last Modified: 15 Jan 2026DAC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rapid development of large-scale deep neural networks has put forward an urgent demand for the efficiency of parallel training. Recently, bidirectional pipeline parallelism has been recognized as an effective approach for improving training throughput. This paper proposes MixPipe, a novel bidirectional pipeline parallelism for efficiently training large-scale models in synchronous scenarios. Compared with previous proposals, MixPipe achieves a better balance between pipeline utilization and device utilization, which benefits from the flexible regulating for the number of micro-batches injected into the bidirectional pipelines at the beginning. MixPipe also features a mixed schedule to balance memory usage and further reduce the bubble ratio. Evaluation results show that: for Transformer based language models (i.e., Bert and GPT-2 models), MixPipe improves the training throughput by up to 2.39× over the state-of-the-art synchronous pipeline approaches.
Loading