A Neural Network-Based Pipeline Parallel Strategy Solver for Heterogeneous Environments

Published: 2025, Last Modified: 14 Mar 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The widespread application of large language models(LLMs) has made distributed training increasingly important, especially pipeline parallelism, which is a fundamental technique for ultra-large-scale LLMs. Current research in this field mainly employs combinatorial optimization algorithms such as dynamic programming. However, as the problem size increases, these methods become difficult to solve quickly in large-scale scenarios due to their high search time. Online optimization algorithms that combine neural networks with reinforcement learning require real-time interaction with the cluster environment to obtain feedback, resulting in high resource overhead and low search efficiency. Moreover, current research lacks studies on heterogeneous computing environments, which are frequently used by small research teams. To address these issues, we designed a novel Neural Network-based Pipeline Parallel strategy solver (NN-Piper) for heterogeneous environments. NN-Piper can perceive computational and communication costs, the number of stages to be divided, and the number of micro-batches. In addition, it can directly provide the strategy for allocating specific devices to each pipeline stage. To avoid an online training process that requires interaction with the cluster environment, we propose the Virtual Contrastive Training Algorithm (VCTA) to enable efficient training of NN-Piper without collecting large amounts of real data. After training, NN-Piper can be transferred to many different scenarios without further training or fine-tuning, and it can search for strategies within a few dozen milliseconds. Compared with the state-of-the-art method, NN-Piper can improve the training speed on average by 16-25% in different environments for the transformer-based models.
Loading