CSIMD: Cross-Search Algorithm with Improved Multi-dimensional Dichotomy for Micro-Batch-Based Pipeline Parallel Training in DNN

Guangyao Zhou, Haocheng Lan, Yuanlun Xie, Wenhong Tian, Jiahong Qian, Teng Su

Published: 2024, Last Modified: 14 Mar 2026Euro-Par (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Parallel training of large-scale networks has attracted the attention of both artificial intelligence and high-performance distributed systems. One of efficient parallelism is the micro-batch-based pipeline, e.g., GPipe. Based on the GPipe, we derive a time-cost model with the basic time function of layers, which considers computing time and communication time simultaneously as well as treats these time as nonlinear to batch size. Focusing on the optimal solutions of network division and data partition, we propose a Cross-Search algorithm with Improved Multi-dimensional Dichotomy (CSIMD). Through theoretical derivation, we prove IMD has appreciable theoretical optimality. Also extensive experiments on both CNN- and Transformer-based networks demonstrate our proposed CSIMD can obtain optimal network division and data partition schemes under GPipe parallelism: CSIMD achieves training speeds respectively \(2.0\times \) and \(2.5\times \) faster than GPipe-R and GPipe-E in CNNs; as well as \(1.5\times \) and \(1.6\times \) in Transformers.
Loading