Breaking Memory and Communication Barriers in Model-Parallel Fine-Tuning of Large Language Models

Zhe Li; Bicheng Ying; Zidong Liu; Haibo Yang

Breaking Memory and Communication Barriers in Model-Parallel Fine-Tuning of Large Language Models

Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang

01 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: zeroth-order optimization, LLM, communication efficiency, memory efficiency, model parallelism, activation compression

Abstract: Model parallelism (MP) has emerged as a promising paradigm for distributed large language model (LLM) training across multiple computing nodes. Yet, almost all existing works about MP focus on first-order methods, which faces two persistent challenges: high communication costs from transmitting activations and gradients, and substantial memory overhead from caching them. Zeroth-order (ZO) methods, by avoiding gradient computation and storage, can naturally alleviate both memory and communication bottlenecks, but they have been largely unexplored in MP for LLM fine-tuning. In this work, we propose ***SparQ***, a ZO MP framework with **Sp**lit layer **a**llocation info**r**med by **Q**uantization-induced activation sparsity, designed to reduce memory and communication costs. *SparQ* builds on three key components: (1) leveraging the gradient-free nature of ZO optimization to eliminate gradient storage and transmission, significantly reducing memory and communication demands incurred by gradients; (2) applying quantization to induce activation sparsity that can be encoded with sparse representations; (3) strategically placing split layers at activation-sparse regions and using sparse representation to lower communication cost from activations almost without compromising model quality. Theoretically, *SparQ* achieves a sublinear convergence rate in non-convex settings, matching that of centralized ZO methods. Empirically, *SparQ* reduces GPU memory usage by over 3× and communication cost by $50\%$+ compared to state-of-the-art baselines, while maintaining comparable model performance.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 568

Loading