TAP: Efficient Derivation of Tensor Parallel Plans for Large Neural NetworksDownload PDF

Published: 16 May 2023, Last Modified: 15 Jun 2023ASSYST OralReaders: Everyone
Keywords: distributed learning, machine learning system, model parallelism
TL;DR: We present a framework that drastically speeds up the process of deriving the tensor parallel schedule for large neural networks.
Abstract: Model parallelism is essential to train large language models efficiently. However, determining the optimal model parallel schedule for a given neural network can be slow and inefficient due to the vast choice space. To address this challenge, we propose a tensor model parallelism framework called TAP, which automatically searches for the best data and tensor parallel schedules. Our approach is based on the observation that a neural network can be represented as a directed acyclic graph, within which only exists a limited set of frequent subgraphs. With that, we design a graph pruning algorithm that efficiently folds the search space. As a result, TAP runs at sub-linear complexity with respect to model size, which makes it a practical solution for large-scale networks. Experimental results demonstrate that TAP outperforms the state-of-the-art automatic parallelism frameworks by $20-160\times$ in searching time. Moreover, the performance of TAP's discovered schedules is competitive with expert-engineered ones. In summary, TAP provides a powerful and efficient tool for model parallelism that can help alleviate the burden of manual tuning.
Workshop Track: ASSYST
Presentation: In-Person
Presenter Full Name: Ziji Shi
Presenter Email: zijishi@comp.nus.edu.sg
Presenter Bio: Ziji Shi is a third-year Ph.D. student from National University of Singapore. His research interests lie in distributed machine learning systems and high-performance computing.
3 Replies

Loading