FlexParallel: Automatic Parallelism Tuner via Grey-Box Optimization for Training Giant Models

Wei zhou; Kaiyang Guo; LinfengLiu; mengyang zhang; Shoubo Feng; Xiong Tang; Naifu Zhang; Wei Guo; Zhitang Chen; BingWang; Gongyi Wang

FlexParallel: Automatic Parallelism Tuner via Grey-Box Optimization for Training Giant Models

Wei zhou, Kaiyang Guo, LinfengLiu, mengyang zhang, Shoubo Feng, Xiong Tang, Naifu Zhang, Wei Guo, Zhitang Chen, BingWang, Gongyi Wang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: foundation model training, distributed systems, automatic parallelism

Abstract: The rapid scaling of large language models (LLMs) has elevated parallel configuration tuning to a central challenge. Most existing frameworks rely on labor-intensive manual tuning. While recent advances attempt to automate this process and reduce reliance on expert intervention, these approaches often depend on highly accurate cost models. In practice, such models frequently fall short due to the challenge in exact modeling, leading to suboptimal configurations. To address the limitation, this work introduces \textit{FlexParallel}, a framework that integrates an uncertain-aware grey-box cost surrogate, a sample-efficient parallelism explorer, and an adaptive stopping criteria, to automatically discover high-performance parallelism configuration. We evaluate the effectiveness of FlexParallel through extensive experiments spanning diverse model architectures, parameter scales, sequence lengths, and cluster sizes. To our best knowledge, this work presents the first empirical evaluation of automatic parallelism tuner on a cluster of up to 8,192 devices. Experimental results demonstrate that, with a limited number of exploration steps, FlexParallel achieves an average speedup of 1.06$\times$ over manual expert tuning, and up to 1.12$\times$ in the best case.

Supplementary Material: pdf

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 24012

Loading