AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

Published: 01 Jan 2024, Last Modified: 08 Mar 2025ASPLOS (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) have demonstrated powerful capabilities, requiring huge memory with their increasing sizes and sequence lengths, thus demanding larger parallel systems. The broadly adopted pipeline parallelism introduces even heavier and unbalanced memory consumption. Recomputation is a widely employed technique to mitigate the problem but introduces extra computation overhead.This paper proposes AdaPipe, which aims to find the optimized recomputation and pipeline stage partitioning strategy. AdaPipe employs adaptive recomputation to maximize memory utilization and reduce the computation cost of each pipeline stage. A flexible stage partitioning algorithm is also adopted to balance the computation between different stages. We evaluate AdaPipe by training two representative models, GPT-3 (175B) and Llama 2 (70B), achieving up to 1.32× and 1.22× speedup on clusters with NVIDIA GPUs and Ascend NPUs respectively.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview