Elastic and Balanced End-to-end Training of Dynamic LLMs with DynMo

Mohamed Wahib; Muhammet Abdullah Soytürk; Didem Unat

Elastic and Balanced End-to-end Training of Dynamic LLMs with DynMo

Mohamed Wahib, Muhammet Abdullah Soytürk, Didem Unat

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformers; Dynamic Models; Pipeline Parallelism; LLMs

TL;DR: Automated load balancing of LLMs in distributed training

Abstract: To reduce the computational and memory costs of Large Language Models (LLMs), schemes that introduce dynamic training are increasingly emerging. Examples of dynamic models are: a) Mixture of Experts (MoEs) at which token routing affects the compute balance, b) gradual pruning of the parameters of a model, c) dynamically freezing layers, d) dynamic sparse attention schemes, e) early exit of tokens as they pass through the model layers, and f) Mixture of Depths (MoDs) schemes where tokens bypass blocks. One side effect that limits the practical value of dynamic models is the introduction of workload imbalance among workers, which in turn negatively affects the efficiency in distributed training. We propose a dynamic load balancing solution DynMo), with a proof that it satisfies maximum reduction in imbalance, to adaptively maintain equal compute workloads among different workers in pipeline parallelism. In addition, DynMo dynamically packs work into fewer workers, while sustaining training throughput, to release the idle workers back to the job manager. DynMo supports both single nodes with multi-GPUs and systems with multi-GPU multi-nodes. In comparison to static distributed training solutions (Megatron-LM and DeepSpeed), DynMo accelerates the end-to-end training of dynamic GPT models by up to 1.23x (MoEs), 3.18x (parameter pruning), 2.23x (layer freezing), 4.02x (sparse attention), 4.52x (early exit), and 1.17x (MoDs). DynMo is available at https://anonymous.4open.science/r/DynMo-4D04/.

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5632

Loading