Elastic Load Balancing for Dynamic LLMs

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: infrastructure, software libraries, hardware, etc.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Transformers; Dynamic Models; Pipeline Parallelism; LLMs
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: To reduce the computational and memory costs of Large Language Models (LLMs), families of training schemes that introduce dynamic training workloads is emerging. For example, in gradual pruning, the pruning of the parameters of a model happens during training to reduce resource requirements. However, one of the side effects of this is that sparsification introduces workload imbalance among workers, which, in turn affects the pipeline parallelism efficiency in distributed training. Similar issues arise in layer freezing schemes. We propose load balancing algorithms to adaptively maintain equal compute workloads among different workers, and also dynamically pack work into fewer workers while sustaining training throughput. Our solution, DYNPIPE, supports both single nodes with multi-GPUs and also systems with multi-nodes. Our methods accelerate the training of dynamic GPT class of models by up to 1.29x in a single node with 8 A100 GPUs, and 2.54x in a data and pipeline hybrid parallelism multi-node setting up to 720 A100 GPUs, over state-of-the art production solutions used in training static LLMs. DYNPIPE is available at https: //anonymous.4open.science/r/DynPipe-CC54
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5256
Loading