AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

Zhenbo Sun, Huanqi Cao, Yuanwei Wang, Guanyu Feng, Shengqi Chen, Haojie Wang, Wenguang Chen

Published: 01 Jan 2024, Last Modified: 08 Mar 2025ASPLOS (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large language models (LLMs) have demonstrated powerful capabilities, requiring huge memory with their increasing sizes and sequence lengths, thus demanding larger parallel systems. The broadly adopted pipeline parallelism introduces even heavier and unbalanced memory consumption. Recomputation is a widely employed technique to mitigate the problem but introduces extra computation overhead.This paper proposes AdaPipe, which aims to find the optimized recomputation and pipeline stage partitioning strategy. AdaPipe employs adaptive recomputation to maximize memory utilization and reduce the computation cost of each pipeline stage. A flexible stage partitioning algorithm is also adopted to balance the computation between different stages. We evaluate AdaPipe by training two representative models, GPT-3 (175B) and Llama 2 (70B), achieving up to 1.32× and 1.22× speedup on clusters with NVIDIA GPUs and Ascend NPUs respectively.