Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

Weigang Zhang, Biyu Zhou, Xing Wu, Chaochen Gao, Zhibing Liu, Xuehai Tang, Ruixuan Li, Jizhong Han, Songlin Hu

Published: 2024, Last Modified: 15 Jan 2026Euro-Par (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling’s impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs , Quartet improves the training throughput by up to 2.16\(\times \) over the state-of-the-art synchronous hybrid parallel approaches.